The Role of Twitter Medium in Business with Regression Analysis and Statistical Modelling

Marketing refers to the strategies a company undertakes to promote its brands to its potential audience. Advertising provides useful venues for marketing to promote a company's survives/goods to the audience. It has a positive impact on the sale of services or products. In this study, we consider a well-known online medium called Twitter (the fourth most popular social media platform used by marketers) to check its impact on sales. For this purpose, the simple linear regression modeling approach is implemented to test the significance and usefulness of Twitter advertising on sale. Statistical tests such as t-test and correlation test are adopted to test the hypothesis of the “impact of Twitter advertising on sales.” Based on the findings of this study, it is observed that Twitter advertising has a positive impact on sales. Furthermore, a new statistical model called the exponential T-X exponentiated exponential is introduced. The proposed model is very interesting and possesses heavy-tailed characteristics which are useful in finance and other related sectors. Finally, the applicability of the new model is illustrated by considering the sales data.


Introduction
Advertisement is a way of marketing to communicate with the potential consumers of certain goods, products, or services. Advertisements are the information usually paid by a company to influence consumer's decision to buy their products. Numerous mediums are available for advertisements and marketing. e available mediums can be divided into two main groups, such as printed mediums and online mediums. Online media is the most effective medium to reach more consumers [1,2].
Online advertising is a way of marketing to convey messages to potential users via using the Internet. It is also called online marketing and helps in finding the potential and right audience. Online advertisements appear in a search engine, on social media, in a browser, or even in e-mail. Online advertising can be classified into different groups such as display ads, e-mail ads, native ads, video ads, and social media ads [3,4].
Twitter is one of the most prominent online/social media platforms that can be used quite effectively for advertising a particular brand. It is a fruitful tool for marketing the business and engaging the audience. It permits its users to create accounts and interact with others through short messages called "tweets." Twitter is one of the marketer's favorite social media platforms. For instance, 53% of marketers worldwide used it as a channel for their marketing activities in 2020. Social media generally provides brands with personally consumer-related opportunities. Twitter has a huge impact on consumer behavior. According to a report by the Digital Marketing Institute, it turns out that about 40% of Twitter users bought something after seeing it on Twitter. is reflects the steadily increasing impact of social media on consumers' buying habits [5][6][7][8].
Not only is it enough to get the company brand involved on Twitter but also brands need to consider the facilities of social media influencers. e influence of social media like Twitter is also competing with friends in building consumer trust. When trying to find a product recommendation, 56% of consumers asked their friends for instructions, and 49% of respondents said that they bought something after consulting their friends; see [9]. On Twitter, recommendations from influencers resulted in a 20% increase in the sale of products [10]. It is ranked fourth in the marketing arena; see Figure 1.
In this paper, we analyze a real-life data set related to Twitter advertising to see its impact on sales. We will use a well-known statistical methodology called the regression technique to see how much the sales can be increased by spending money on Twitter advertising. To carry out the regression analysis, we will test the hypothesis of "the impact of Twitter advertising on sales." In this regard, the null hypothesis (NH) expressed by H 0 and alternative hypothesis (AH) denoted by H A will be formulated as follows: H 0 � Twitter advertising has no significant impact on sales versus H A � Twitter advertising has a significant impact on sales.
Furthermore, to provide an adequate fit to the sales data, a new heavy-tailed (HT) statistical model is introduced. e new model is introduced by combining the exponential distribution with the exponential TX strategy. e new model is entitled the exponential T-X exponentiated exponential (ETXE-exponential) model. e comparison of the ETXE-exponential distribution is done with the other models. e numerical results show that the ETXE-exponential is a suitable model for dealing with data in business, finance, and management sciences.

Regression Analysis
e regression method helps to carry out the regression analysis to forecast the demand for a particular product. It determines the nature of the relationship between the product's demand and its corresponding determinants. e regression analysis can be carried out via three main approaches, depending upon the nature of the study. Figure 2 illustrates the main divisions of the regression analysis.

Simple Linear Regression Approach.
In this paper, we adopt the simple linear regression (SLReg) approach as we have one independent variable (Twitter advertisement). is approach helps to predict the response variable, usually represented by Y based on the independent variable(s) usually represented by X. e simple linear regression can be expressed mathematically as follows: where θ 0 is representing the intercept of the model and θ 1 is called the slope of the model. A value 0 of θ 1 indicates no effect of X on Y, and in such situation, the model is given by Y � θ 0 . e quantity ε is the residual error term having a mean value of 0.
In case of explaining the relationship between sales Y and Twitter advertising X, the simple linear regression becomes After carrying out the analysis using the regression technique, we found that θ 0 � 5.621 representing the predicted sales in thousands of USD (United States dollars), when there is no investment on the Twitter medium.
erefore, under the circumstances of no investment on Twitter medium, the expected sale would be 5.621 * 1000 � 5621 dollars. e slope (also called the regression coefficient) of the regression model θ 1 , provided in equation (2), is 0.193 representing 193(0.193 * 100) units increment in the sales. Henceforth, using the Twitter medium as a marketing tool, the predicted/estimated sale is 5.621 + 0.193 * 1000 � e visual display of the relationship between Twitter advertising and the sale of goods is provided in Figure 3. From the plot in Figure 3, it is obvious that there is a positive relationship (PR) between Twitter advertising and sales. e PR means that the more we spend on marketing via the Twitter medium, the more will be the sale.

Statistical Testing.
Is there a PR between sales and Twitter advertising medium? Certainly, there is a relationship between them, since the estimated slope θ 1 of the regression line is 5.621, not 0, showing that there is a relationship between sales and Twitter advertisement in the sample of 130 observations. However, we want to know if there is a relationship between the observations of all the sales and Twitter advertisement; that is, we want to know that if θ 1 is unlikely to be 0.
For these purposes, we adopt a standard hypothesis approach to conduct a hypothesis test for θ 1 . To carry out the procedure, first, we need to accomplish and specify H 0 and H A as follows: H 0 : θ 1 � 0 (no significant relationship exists between sales and Twitter advertisement) versus H A : θ 1 ≠ 0 (a significant relationship exists between sales and Twitter advertisement). Second, we obtain the value of t-statistics using the following formula: ird, we compute the value of t-statistics using statistical software. By default, the t-statistics is calculated by assuming that θ 1 � 0. Divide the estimated coefficient θ 1 � 5.621 by its standard error S.Eθ 1 � 0.44428 to get the value of t-statistics as follows: Fourth, we calculate the p value. By default, the p value is calculated by assuming H A is a "not-equal-to." If the p value is less than 0.05, then we reject H A . After performing the analysis, we observe that the p value is 2e − 16.
Since the p value is very small (less than 0.001), we reject the hypothesis of θ 1 � 0 and accept the hypothesis of θ 1 ≠ 0. Henceforth, there is significant evidence that there is a linear relationship between sales and Twitter advertisement.

Residuals.
e regression line is used to quantify a linear trend in the data. e residual represents the distance between the true value and observed values of Y. Mathematically, it is given by e residual standard error is used to quantify the quality of the regression fit. In this study, it represents the average amount of the sales variable (Y) varies from the true regression line. In the context of this study, a visual display of the behavior of the residuals is provided in Figure 4.
From the plots presented in Figure 4, we have the following observations: (1) From the residual versus fitted plot, we can see that the red line is quite close to the residual value of 0. erefore, we assume that the residuals possess the linear property. By linearity, we mean that the X in the regression model has a straight-line relationship with Y.
(2) One of the fundamental assumptions of regression modeling is the normality of the data. e QQ (quantile-quantile) plot makes an angle of 45 showing that the residuals are approximately normally distributed.
(3) Homoscedasticity is one of the basic assumptions of linear regression modeling. If this assumption of homoscedasticity does not hold, then the problem of heteroscedasticity arises. e scale-location plot shows that the residuals satisfy the homoscedasticity property. (4) In regression modeling, the influential observations have a serious effect on the parameters estimates. e residual versus leverage plot shows that there are fewer influential observations in the data. Computational Intelligence and Neuroscience

Correlation Test.
A correlation test is a statistical tool used to quantify the relationship between two or more variables. In this study, we have only two variables (sales and Twitter advertising). Henceforth, we adopt the Pearson correlation approach, which is used to measures a linear relationship between two variables. e coefficient of the Pearson correlation is usually denoted by r and derived as where μ Twitter and μ Sales represent the mean values of Twitter and sales, respectively. e significance of r can be tested by using the t-test and is given by To the test r, we use the hypothesis procedure as H 0 � 0 (there is no relationship between Twitter advertising and sale) versus H A ≠ 0. After performing the required steps of analysis, we observe that r � 0.5253, showing a positive relationship between sales and Twitter advertising. Furthermore, we observe that the Spearman rank correlation test represented by Sp, is 161 with p value given by 4.634e − 12. As we see that p-value < 0.05, we reject H 0 and conclude that there is a significant relationship between Twitter advertising and sale. is relationship has also been shown graphically in Figure 5.
2.5. Testing of Normality. As we stated above that in the domain of regression modeling, the assessment of the normality of data is one of the basic requirements. Basically, there are two approaches for testing the normality of the data: (i) numerically and (ii) graphically. For numerical analysis, we use the Shapiro-Wilk (SW) test for checking the normality of the sales and Twitter advertising data. is test  is carried out in the following manner, H 0 � the sale data is not normally distributed versus H A � the sale data is normally distributed. After applying the SW test, we found that (i) for the Twitter advertising data, SW � 0.96011 with p value � 0.000739 and (ii) for the sale data, SW � 0.967900 with p value � 0.00356.
Since the p values associated with the Twitter advertising data and sales data are less than 0.05, we reject H 0 and accept the assumption of normality.
In addition to the numerical test (SW test), we also use a visual approach based on the QQ plot; see Figure 6. For assuming the normality, the QQ plot makes an angle of 45 degrees. Since all the data points are scattered around the referenced line, we can say that the sales and Twitter advertising data sets are normally distributed.

Statistical Modeling
is section consists of four sections: (i) in the very first section, we describe how the new model is obtained, (ii) in the second section, the proposed model is introduced and its different PDF (probability density function) plots are sketched, (iii) in the third section, we discuss the regularly varying tail behavior of the proposed model, and (iv) finally, a real application is discussed in the last section.
where δ 1 is a shape and δ 2 is the scale parameter of the EE distribution. e PDF (probability density function) of the EE model represented by h(y; Φ) is As we know in the finance sector, the data sets, particularly the financial returns, are frequently right-skewed with heavy tails. Among the class of statistical distributions, the HT distributions have shown better performance in modeling data in the finance sector where the financial returns experience HT, right-skewed, and unimodal behavior. According to Beirlant et al. [11], a model is called HT distribution, its SF (survival function) H(y; Φ) � 1 − H(y; Φ), satisfies where a > 0.
An important characteristic of the HTdistributions is the RVP (regular variational property); see Resnick [12]. A model is called RVD (regular varying distribution) if it satisfies e distributions which possess the RVP are very useful models for dealing with the financial returns and HT data; see [13], [14], [15], and [16].Numerous methods to obtain new HT and flexible distributions have been introduced. Among the available methods in the literature, the exponential TX family is a prominent method to generate HT distributions [17]. For σ > 1 and y ∈ R, the DF G(y; Φ), and PDF g(y; Φ) of the exponential TX family are given by Keeping in view the role of the HT distributions in the finance sectors, we are motivated to study a new HT model called ETXE-exponential distribution.
e ETXE-exponential distribution is very interesting and offers to model the data that are skewed to right with a heavy tail. is fact is shown in the coming three sections. For example, the graphical behavior of its PDF is shown in section 3.2, where we can see that the ETXE-exponential distribution is a HT model. Secondly, the HT properties of the ETXE-exponential distribution are proved in section 3.3. Finally, positively skewed data related to the sales are analyzed in section 3.4, where we show that the ETXE-exponential distribution can be a better model for the HT financial data sets as compared to other existing distributions.

e ETXE-Exponential Distribution. A random variable
Y has the ETXE-exponential model with scale parameters δ 2 > 0, σ > 1 and shape parameter δ 1 > 0, if its DF is and PDF is  From the PDF plots sketched in Figure 7 (left panel), it is clear that when the value of δ 1 increases, the ETXE-exponential distribution tends to a HT distribution. Also, from the PDF plots sketched in Figure 7 (right panel), as the value of σ increases the proposed EXTE-exponential distribution possesses HT behavior.

e Heavy-Tailed Behavior.
is section deals with the regularly varying tail behavior of the ETXE-exponential distribution which is very important to characterize the HT property of a model.

3.3.1.
e Regularly Varying Tail Behavior. According to Karamata's theorem [18], in terms of SF (survival function) G(y; Φ), we have the following.    Computational Intelligence and Neuroscience Using equation (1) in equation (17), we get which is finite but nonzero for every k > 0; thus, G(y; σ, Φ) is a RVD.

An Application of the Regularly Varying Tail Behavior.
Let us assume that the distribution of Y has PLB (power law behavior); then, we have Utilizing Karamata's theorem [18], we can write G(y; Φ) as where L(y; Φ) is slowly varying. Note that Since H(y; Φ) ∼ y − β , we can write where L(y; Φ) � σ/σ − 1 + y − β . If L(y; Φ) is a SVF (slowly varying function), then the variational result obtained is true. As per Resnick [12], for all β > 0, we have to show that lim y⟶∞ L(βy; Φ) L(y; Φ) � 1.

Analyzing the Sales Data.
As we mentioned earlier that the HT statistical models are very useful in describing the financial phenomena. In this section, we show the best fitting power of the ETXE-exponential model via analyzing the sales data which is consists of 130 observations. e Twitter advertising and sales data sets are available at https://data.world/datasets/twitter. Table 1 offers the BMs (basic measures) of the sales data. e total time test (TTT), histogram, and box plot of the sales data are provided in Figure 8. From Figure 8, we can easily observe that the data set is unimodal and skewed to the right. e data set possessing such characteristics can be better modeled via using the ETXE-exponential distribution. e comparison of the ETXE-exponential is made with three well-known distributions available in the literature.
e DFs of the competing models are as follows: Computational Intelligence and Neuroscience (1) MOW (Marshall-Olkin Weibull) model [19]: (2) FW (flexible Weibull) model [20]: (3) APTW (alpha power transformed Weibull) model [21]: e term "better modeling" is used in the sense that the ETXE-exponential distribution has the smaller values of the selected IC (information criterion) considered for comparison. e expressions of the IC are given by the following: (4) e AIC (Akaike IC) [22] is (5) e BIC (Bayesian IC) [23] is (6) e HQIC (Hannan-Quinn IC) [24] is HQIC � 2k log(log(n)) − 2ℓ. (32) In addition to the IC, we further considered three goodness of fit measures (g-o-f ) including the following: (1) e AD (Anderson-Darling) test statistic: (2) e CM (Cramer-von Mises) test statistic: (3) e KS (Kolmogorov-Smirnov (KS) test statistic: e optimization technique is adopted to get the MLEs (maximum likelihood estimators) of the competing models. In this section, we implement the Newton-Raphson (NR) iteration approach to get the MLEs of the fitted models using the sales data. e MLEs of the fitted models are provided in Table 2. e values of the IC are reported in Table 3, and the g-o-f measures with p value are presented in Table 4.
Based on the sales data, the results reported in Tables  We can see that the FW is the second-best model, but not in terms of KS and p value. From Table 4, it is clear that the APTW distribution is the second-best model as it has the second smallest KS value which is 0.0559, and the secondhighest p value which is 0.8106.
Furthermore, for the best description of the results of the sales data, the estimated PDF and CDF plots of the fitted models (EXTE-exponential (red line), MOW (green line), APTW (pink line), and FW (blue line)) are provided in Figure 9. e PP (probability-probability), and QQ plots are sketched in Figures 10 and 11, respectively.

Future Research Directions
In this work, we restricted our study to simple linear regression analysis by using the Twitter medium as a predictor variable. However, many other online mediums can be used for advertising purposes such as YouTube, Facebook, and Twitter. erefore, in the future, we are intended to implement the multiple linear regression modeling approach to see the impact of different advertising mediums on sales. Generally, the multiple linear regression model (MLRM) with k predictors is given by e future research study will be carried out in three different phases. For example, (i) in phase one, the MLRM will be implemented with two (X 1 � medium 1, X 2 � medium 2) advertising mediums; (ii) in phase two, the MLRM will be used by considering three (X 1 �medium 1, X 2 � medium 2, X 3 � medium 3) advertising mediums; and (iii) in phase three, the MLRM will be applied by taking four (X 1 �medium 1, X 2 � medium 2, X 3 � medium 3, X 4 � medium 4) advertising mediums.      e MLRM with two predictors is given by In the future, we are planning to use the MLRM defined in equation (38) to see the impact of two different advertising mediums on sales. e MLRM models with possible combinations of two different advertising mediums are given by the following:  (1) Effect of Twitter and YouTube advertising mediums on sales e MLRM with two advertising mediums (X 1 �Twitter and X 2 � YouTube) is given by Sales � θ 0 + θ 1 Twitter + θ 2 YouTube + ε. (39) (2) Effect of Twitter and Facebook advertising mediums on sales e MLRM with two advertising mediums (X 1 �Twitter and X 2 � Facebook) is given by (3) Effect of Twitter and Instagram advertising mediums on sales. e MLRM with two advertising mediums (X 1 �Twitter and X 2 � Instagram) is given by

e MLRM with ree Predictors.
e MLRM with three predictors is given by In the future, we are also interested in using the MLRM defined in equation (42) to see the impact of three different advertising mediums on sales.

e MLRM with Four Predictors.
e MLRM with four predictors is given by Y � θ 0 + θ 1 X 1 + θ 2 X 2 + θ 3 X 3 + θ 4 X 4 + ε. (45) In the future, we are also motivated to use the MLRM defined in equation (45) to see the impact of four different advertising mediums on sales.

Concluding Remarks
In this research, we studied the relationship between Twitter advertising and sales by adopting the simple linear regression approach. For testing the impact of Twitter medium as an advertising tool, two main statistical approaches such as ttest and correlation test are utilized. Based on the results of the t-test and correlation test, it is observed that Twitter advertising plays a useful role in increasing sales. Furthermore, a new HT model called the ETXE-exponential distribution is developed by combining the exponentiatedexponential model with the ETX family. e ETXE-exponential distribution was introduced to provide a close fit to sales data, which is an important area of the finance sector. For proving the applicability of the proposed model, the sales data is analyzed. e comparison of the ETXE-exponential distribution was made with MOW, APTW, and FW distributions. By taking into account certain analytical tools, we showed that the ETXE-exponential distribution is the best model for modeling data in the finance sector.
Data Availability e data are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.