INVESTIGATING AN ASYMMETRIC RATIO COSINE DISTRIBUTION

A BSTRACT . Most real-world data have asymmetric features, slight or pronounced, that cannot be analyzed in depth using classical symmetric distributions. This is especially true when the underlying phenomena take their values in bounded support. Examples include data with support ( − 1 , 1)


INTRODUCTION
In statistical modelling, the assumption of symmetry in distributions is a common but often unrealistic simplification.See [8] and [16].In fact, real world data often exhibit significant asymmetry.For this reason, traditional symmetric distributions, such as the normal, Student, Cauchy, Laplace and logistic distributions, fail to capture these asymmetric features effectively.This inadequacy becomes even more pronounced when dealing with phenomena that are constrained within a bounded support.In this case, the symmetry assumption can lead to significant biases and inaccuracies.Examples include data with support (−1, 1), or, almost equivalently, [−1, 1], which can correspond to daily temperature anomalies, stock market returns, satisfaction ratings, correlation coefficients, normalized test scores, or sentiment analysis scores.
For the purposes of this article, among the few known symmetric distributions with support (−1, 1), we will focus on the cosine (C) distribution (also called the raised cosine distribution) for the reasons explained below.As prime information, it is defined by the following probability density function (pdf): x ∈ (−1, 1), (1.1) and f (x) = 0 for x ∈ (−1, 1), and the following cumulative distribution function (cdf): and F (x) = 0 for x ≤ −1 and F (x) = 1 for x ≥ 1.The symmetry is clear: we have f (−x) = f (x) for any x ∈ (−1, 1).In addition, f (x) has the property of being bell-shaped, and is a reasonable approximation of the pdf of the "universal" standard normal distribution, but above (−1, 1).In fact, the C distribution is one of the rare trigonometric distributions of this type.For more details, see [19], [10], [12], [18], and [21].Among the more recent references on the C distribution, we may mention [1], which studied its various characterizations, [20], which used it in an asymmetric system to provide a new skewed normal distribution, [2], which proposed a two-parameter generalization, and [6], which studied original distributions based on the deformation techniques applied to the cdf in Equation (1.2).
This article proposes an alternative to the AC distribution.It is also derived from the C distribution, but proposes an original ratio-type construction of its pdf that aims to innovate in terms of functionalities.We call it the ratio cosine (RC) distribution.General ratio-type distribution schemes exist, such as the Marshall-Olkin scheme (see [13]), but we consider a trigonometric ratio approach that is more closely related to the C distribution.In addition to this original ratio feature, it incorporates two adjustable parameters to provide a versatile framework for analyzing data with varying degrees of skewness and kurtosis.These innovations aim to provide a more accurate representation of the underlying phenomena.We examine the main functions and theoretical properties of the RC distribution, including its moments, quantiles and other distributional characteristics.This is done mathematically and, where appropriate, numerically and graphically.We also explore several derived distributions, giving extensions of the uniform, Cauchy and half-Cauchy distributions.Four examples of simulated (not real) data from different relevant scenarios demonstrate the ability of the RC distribution to capture nuanced behavior.In particular, using standard information criteria, we highlight the fact that it can fit asymmetric data more accurately than the AC distribution.
The article consists of the following sections: Section 2 concerns a mathematical result describing a general pdf of the ratio type, involving trigonometric functions and several adjustable parameters.As an application of this result, the RC distribution is presented in Section 3, together with its main properties.Section 4 focuses on the associated moments.Some distributions derived from the RC distribution are discussed in Section 5. Examples of applications are described in Section 6.A conclusion is proposed in a final section, i.e., in Section 7.

GENERAL RESULT
Inspired by the pdf of the C distribution as defined in Equation (1.1), the proposition below investigates a possible general ratio-type pdf with trigonometric functionalities.The generality is characterized by the presence of three adjustable parameters.The main objective is to determine their value ranges with respect to the definition of a pdf.

Now we examine condition (III)
. Since c = 1/2, using standard integral rules with the appropriate primitive, we obtain All the required conditions are satisfied; f (x; a, b, c) is a valid pdf.This concludes the proof.
This theoretical result shows that a ratio modification of the C distribution is possible.It has the advantage of breaking the symmetry of the C distribution in an original way, also determined by two adjustable parameters, a and b (c is chosen to be 1/2).More precisely, for any x ∈ (−1, 1)/{0} and some values of b, we have However, the main drawback of the pdf f (x; a, b, c) is the condition imposed on a and b.Indeed, the relation |a| + |b| ≤ 1 implies an interdependence between them.To solve this problem, we propose a trigonometric re-parameterization involving two other independent parameters.More precisely, we set For a computational benchmark, we give π/2 ≈ 1.570796.We clearly have and the new parameters r and θ are not connected between them.In the light of this observation, we motivate a deeper study of the pdf defined with these parameters, thus creating the RC distribution (we recall that RC stands for ratio cosine).We fix the mathematical framework in the next section.

RC DISTRIBUTION
3.1.On the pdf.Based on Proposition 2.1 and the comments that follow it, the RC distribution is defined by the following pdf: , x ∈ (−1, 1), (3.1) and g(x; r, θ) = 0 for any x ∈ (−1, 1), with r ∈ [−1, 1], and θ ∈ [0, π/2].In order to visualize the flexibility and asymmetry characteristics of this pdf, we propose a graphical analysis using the free software R.More details on this software can be found in [17].
We now refine this visualization by fixing one parameter and varying the other.Specifically,  The behavior of this pdf at x = 0, x → −1 + and x → 1 − is easy to understand from a mathematical point of view.In fact, we have This last equality implies a symmetry in the value of g(x; r, θ) at the two extremes of the support.It can be seen as a limitation of the RC distribution to fit certain data with a large difference between these extremes.The result below examines a series expansion of g(x; r, θ).It can be used as a mathematical approximation for various probability measures involving this pdf.Some examples will be developed later in the context of moments.Proposition 3.1.For any x ∈ (−1, 1)/{−1/2, 1/2}, the pdf of the RC distribution can be expanded as Proof.For any x ∈ (−1, 1)/{−1/2, 1/2}, we have sin(πx) ∈ (−1, 1), and since r ∈ [−1, 1] and θ ∈ [0, π/2], we have r[sin(θ)] 2 sin(πx) ∈ (−1, 1).The geometric series expansion applied to the ratio term of the pdf gives The desired expansion is demonstrated.
The interest of this result is to transform the ratio form of g(x; r, θ) into a linear sum that can be easily manipulated from a mathematical point of view.In particular, by replacing +∞ by any large integer, say M , the following approximation may be reasonable:

On the cdf.
Let us now focus on the cdf of the RC distribution.It is determined in the proposition below.
It should be noted that G(x; r, θ) has a certain originality in its expression, in particular by mixing the logarithmic and trigonometric functions.It differs from the cdfs of the C and AC distributions, as described in Equations (1.2) and (1.4), respectively.
The cdf fully defines the RC distribution and provides prior information on several aspects.One such aspect is the quantiles of the RC distribution.It is defined by The first quartile is given by Q 1 = Q(0.25;r, θ), the second quartile, called the median, is given by Q 2 = Q(0.5;r, θ), and the third quartile is given by Q 3 = Q(0.75;r, θ).We can also present the interquartile range defined by Q = Q 3 − Q 1 .These quantities are the main component of the theoretical box plot associated with the RC distribution.A feature of the RC distribution is that which means Q 2 = 0 regardless of the values of the parameters.As a result, the RC distribution is not appropriate for data that have an empirical median that is far from 0, which can be seen as a limitation in a practical sense.Since the quantile function has not closed form expression, a numerical work is possible only.In Table 1, we provide the numerical values for Q 1 , Q 2 , Q 3 , and Q , taking into account the parameters (r = 1, θ = 0.001), (r = 1, θ = 1.57), (r = 0.8, θ = 0.1), (r = −1, θ = 0.001), (r = 0.5, θ = 0.2), (r = 0.8, θ = 1.2), and (r = −1, θ = 0.8).As discussed above, we constantly find Q 2 = 0.However, some variability is observed for Q , indicating the adaptability of the RC distribution from a quantile perspective.In a similar way, we can study other quantile measures, such as the Bowley skewness, as described in [9], and the Moors kurtosis, as studied in [15].
3.3.On the hrf.The hrf of the RC distribution is derived directly from the corresponding pdf and cdf.More precisely, it is given by h(x; r, θ) = g(x; r, θ) This shows that the hrf can be increasing or non-monotonic.In particular, the green curve can be identified as a "decreasing then increasing curve".The range of shapes is limited; it is therefore of moderate flexibility.

MOMENTS
Since the RC distribution is a distribution with bounded support, it admits moments of any order.However, due to the ratio term in the pdf, we cannot find closed-form expressions for them.In the proposition below, we determine a possible mathematical expansion for these moments, which can be used for approximation purposes.For any m ∈ N, we define the m-th moment associated with the RC distribution as µ m = E(X m ), where E denotes the expectation operator.It can be expanded as where Proof.Using the integral expression of µ m , Proposition 3.1, and exchanging the symbol for the integral and the symbol for the sum, we get where Through the application of a well-calibrated integration by parts, we obtain The combination of the above equations gives This ends the proof.

DERIVED DISTRIBUTIONS
This section is dedicated to some new and potentially interesting distributions with different support that can be derived from the RC distribution.
In the result below, we exhibit a new distribution with support (0, 1) generated by the RC distribution.Then Y has the following cdf: and S(x; r, θ) = 0 for x ≤ 0, and S(x; r, θ) = 1 for x ≥ 1.

Asymmetric Cauchy distribution.
In the result below, we show how to use the RC distribution to derive a new asymmetric Cauchy distribution.Proposition 5.2.Let X be a random variable with the RC distribution with parameters r ∈ Then Y has the following cdf: where V (x) denotes the cdf of the standard Cauchy distribution, i.e., Proof.Since the support of X is (−1, 1), that of Y is R.For any x ∈ R, based on the cdf in Equation (3.2), we have Using sin[2 arctan(x)] = 2x/(1 + x 2 ), we get The desired cdf is obtained.
From this result, it is natural to introduce the asymmetric Cauchy distribution defined by the indicated cdf.After some manipulations, the corresponding pdf is obtained as Various asymmetric and "nearly symmetric" forms are observed, making the proposed asymmetric Cauchy distribution a valuable option for analyzing data with skewed characteristics.
Other asymmetric distributions with support R can be created from a random variable X with the RC distribution.We may think of the distributions of the following random variables: Y = artanh(X), where artanh

Modified half-Cauchy distribution.
We now focus on distributions with support (0, +∞).In the result below, we use the RC distribution to derive a modified half-Cauchy distribution.More details on the half-Cauchy distribution can be found in [8].Then Y has the following cdf: and W (x; r, θ) = 0 for x ≤ 0, where Z(x) denotes the cdf of the standard half-Cauchy distribution, i.e., and Z(x) = 0 for x ≤ 0.
Other distributions with support (0, +∞) can be created from a random variable X with the RC distribution.We may think of the distributions of the following random variables: Y = 2/(1 − X) − 1, and e (1+X)/(1−X) − 1.
In this article, we do not further develop the asymmetric Cauchy and modified half-Cauchy distributions as presented in Propositions 5.2 and 5.3, but they have clear potential interest from a modelling point of view.The main aim was to show that the RC distribution can be used in ways beyond its primary distributional nature.

EXAMPLES OF STATISTICAL APPLICATIONS
In this section, we consider the RC distribution from a statistical point of view.We develop a suitable estimation method and apply it to various simulated data analysis scenarios.
6.1.Method of estimation.Basically, if we turn the RC distribution into a statistical model, the unknown components become the parameters involved, i.e., r and θ.Using available data obtained from observations of a certain phenomenon with values in (−1, 1) (or, without loss of generality, [−1, 1]), we want to estimate these parameters.To do this, we can use the maximum likelihood (ML) estimation method.
In order to describe this method, let us consider a fixed number of data, say n, and let us denote them as x 1 , . . ., x n .They are thus supposed to take values in (−1, 1).We then associate the RC distribution with the likelihood function given by .
Its logarithmic version is given as (r, θ) = log[L(r, θ)].Then the ML estimates (MLEs) of r and θ, say r and θ, are calculated as follows: To determine these argmaxima, we can use the R software, and in particular the function nlminb.On this basis, the estimated pdf of the RC distribution is obtained by the substitution method, which yields If the RC distribution is appropriate to the data, the form of this estimated function should efficiently fit the form of the corresponding normalized histogram with respect to the pdf feature.It is therefore possible to check this fit graphically.The same reasoning applies to the estimated cdf of the RC distribution and the empirical cdf of the data.
In addition, we can define some criteria that allow us to compare the quality of the fit between two or more distributions.Here, we consider the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), given by AIC For these data, following the procedure described in Subsection 6.1, the ML estimation method for the RC distribution is performed.We obtain r = −0.9228052and θ = 0.7339053, implying that r[cos( θ)] 2 = −0.5088365,r[sin( θ)] 2 = −0.4139687.
The estimated pdf of the RC distribution, i.e., ĝ(x), is obtained based on Equation (6.1). Figure 10 shows the normalized histogram of the data and the form of ĝ(x).This form is represented by the blue line.We can see that the blue line fits the form of the histogram well.In particular, the overall asymmetry is well captured, as is the gap in the middle of the histogram.
On the other hand, the considered information criteria are computed, and we obtain AIC = 48.58214and BIC = 51.74917.For comparison, if we apply the ML estimation method to the AC distribution defined by the pdf in Equation (1.3), we obtain AIC = 48.8034and BIC = 51.97044, with the following MLEs: α = −0.53942601and β = −0.05712751.Since it has a lower AIC and BIC, the RC distribution can be considered the best.In Figure 11, we superimpose the forms of the estimated pdfs of the RC and AC distributions for visual comparison.It can be seen that the estimated pdf of the RC distribution gives a slightly better fit to the overall form of the histogram than the pdf of the AC distribution.6.2.2.Example 2. In this example, we look at the monthly percentage returns of an investment portfolio over the last 16 months.Each value corresponds to the return for a particular month, with positive values indicating a gain in the value of the portfolio and negative values indicating a loss.The data are normalized to the range (−1, 1), where −1 represents the worst possible monthly return corresponding to the maximum loss, and 1 represents the best possible monthly return corresponding to the maximum gain.
From these values we determine the estimated pdf ĝ(x).Figure 14 shows the normalized histogram of the data and this estimated pdf represented by the blue line.We see a similar phenomenon to that observed in Example 2; the blue line has well captured the change in form of the histogram.This is particularly true for the form of the curve of the second apparent "subhistogram".
In terms of information criteria, our calculations give AIC = 34.87796and BIC = 37.39416.For comparison, let us look at the AC distribution.After performing the ML estimation, we obtain AIC = 35.67501and BIC = 38.19121,which are associated with the following MLEs: α = −0.57274032and β = −0.03423876.Since it has a lower AIC and BIC, the RC distribution is preferable.In Figure 15, we have superimposed the estimated pdfs of the RC and AC distributions for visual comparison.Visually, the blue line appears to be closer to the form of the histogram than the orange dashed line.In particular, it captures the empty range of values at the exact location, as well as the overall asymmetry.6.2.4.Example 4. We now analyze some simulated sentiment data.We consider 27 values representing the daily sentiment scores (on a scale from −1 to 1) of a sentiment analysis model applied to news headlines related to a particular stock market index over a period of one month.Each score corresponds to the sentiment polarity of the aggregated news headlines on a given day.Logically, positive values, i.e., closer to 1, indicate a generally positive sentiment towards the market, while negative values, i.e., closer to −1, indicate a generally negative sentiment.Naturally, values close to zero indicate neutral sentiment.
The RC distribution is one possible modelling option to analyze such data.The MLEs of the corresponding parameters r and θ are r = −0.9667197and θ = 1.0609551 respectively.From these, we calculate r[cos( θ)] 2 = −0.2302550,r[sin( θ)] 2 = −0.7364647,and the estimated pdf ĝ(x) follows.Figure 16 shows the normalized histogram of the data and this estimated pdf represented by the blue line.The form of the histogram is heavily skewed to the right, and this is the real difference from the previous examples.Visually, the blue line fits this particular asymmetry well.
We also find that AIC = 40.28028and BIC = 42.87195.For comparison purposes, let us focus on the AC distribution.The ML estimation gives AIC = 40.65739and BIC = 43.24906,associated with the following MLEs: α = 0.2234545 and β = 0.1544729.Since it has a lower AIC and BIC, the RC distribution is indicated as the best.In Figure 17, the forms of the estimated pdfs of the RC and AC distributions are superimposed for visual comparison.We can see that the AC distribution has missed the top bar of the histogram and the overall asymmetry.This is not the case for the RC distribution.
To conclude this section, it is important to note that other simulated examples were examined and the RC distribution was not always the best compared to the AC distribution.In several tests, the AC distribution was the best in terms of AIC and BIC.In this section, we have highlighted some cases in favor of the RC distribution.In particular, we have found that the RC distribution is of particular interest when the histogram of the data has a smooth V-shape or is heavily skewed to the right.

FIGURE 10 .
FIGURE 10.Form of the estimated pdf of the RC distribution over the histogram of the data of Example 1

FIGURE 11 .
FIGURE 11.Comparison of the fits of the estimated pdfs of the RC and AC distributions based on the data of Example 1

FIGURE 12 .
FIGURE 12. Form of the estimated pdf of the RC distribution over the histogram of the data of Example 2

FIGURE 13 .
FIGURE 13.Comparison of the fits of the estimated pdfs of the RC and AC distributions based on the data of Example 2

FIGURE 14 .
FIGURE 14. Form of the estimated pdf of the RC distribution over the histogram of the data of Example 3

FIGURE 15 .
FIGURE 15.Comparison of the fits of the estimated pdfs of the RC and AC distributions based on the data of Example 3

FIGURE 16 .
FIGURE 16.Form of the estimated pdf of the RC distribution over the histogram of the data of Example 4

FIGURE 17 .
FIGURE 17.Comparison of the fits of the estimated pdfs of the RC and AC distributions based on the data of Example 4

TABLE 3 .
determines the values of µ 1 , σ, G 3 and G 4 .Values of µ 1 , σ, G 3 and G 4 [3]θ), respectively, where k denotes the number of unknown parameters, i.e., k = 2 here.The rule is simple: lower AIC and BIC values indicate better distributions.For more details on this part, see[3].6.2.Examples of applications.We now present four examples of applications based on simulated data (not real data) in specific scenarios.6.2.1.Example 1.The first example considers 36 hypothetical monthly temperature anomalies, i.e., deviations from the average, over a year for a certain location.An anomaly of 0 indicates that the temperature was exactly the long-term average for that month, positive values indicate warmer than average temperatures, while negative values indicate cooler temperatures, with respect to the range (−1, 1).