Using Multinomial Logistic Regression model to study factors that affect chest pain

: In this work Logistic Regression model was utilized which is one of the significant techniques for categorical data analysis, the purpose of this study to distinguish a use of Multinomial Logistic Regression method when model arrangements one (nominal/ordinal) response variable that has multiple classifications, regardless of whether nominal or ordinal variable. This method has been applied in medical area and for application the data for heart disease was taken, the data that contains nine variables such as (Chest Pain, Age, Sex, Cholesterol, Fasting Blood Sugar, Thalac "Maximum Heart Rate", Exercise, Oldpeak "ST Segment Depression induced by Exercise relative to rest" and Blood Pressure). Where the Chest Pain is the Response variable and the eight other variables are explanatory variables, after analyzing the data we conclude that there are many significant variables in each reference categories in the model, specifically (Thalac "Maximum Heart Rate" and Exercise) were significant in each different categories in the Multinomial Logistic Regression Model.

In recent years, specialized statistical methods for analyzed categorical data have expanded, especially for application in biomedical and sociology.Regression analysis is one of these statistical tools that utilize the relationship between two or more variables.The regression models can be partitioned into two groups, the first related to linear relationship models, and the second related to non-linear relationship models.The linear models, considered so far, are palatable for most regression applications.Nonlinear model utilized when the linear model is not appropriate at any rate.Huge numbers of statisticians believe that the logistic regression model is one of the important models can be applied to analyze a categorical data; this model is an extraordinary instance of generalized linear models (GLM).The multinomial logistic regression (MLR) model utilized in commonly powerful where the response variable is composed of more than two levels or categories.
The essential idea was generalized from binary logistic regression.Continuous variables are not utilized as response variable in logistic regression, and only one response variable can be used.The MLR model can be used to predict a response variable based on the continuous and/or categorical explanatory variables to decide the percent of difference in the response variable clarified by the explanatory variables, to rank the general significance of independents, to survey connection impacts, and to comprehend the effect of covariate control variables.The MLR model permits the simultaneous comparison of more than one differentiation, that is, the log odds of three or more contrasts are estimated simultaneously (Garson, 2009: 461-463).The logistic regression model expects that the categorical dependent variable has just two values, all in all, 1 for progress and 0 for failure.The logistic regression model can be reached out to circumstances where the response variable has more than two values, and 536 there is no regular requesting of the categories.Common ordering can be treated as nominal scale, such information can be analyzed by marginally changed strategies utilized in dichotomous outcomes, and this method is called the multinomial logistic.The impact of indicator is ordinarily explained regarding of odds ratios.After transforming the dependent into a logit variable (the natural log of the odds of the dependent occurring or not) logistic regression applies maximum likelihood estimation Logistic models computes changes in the log odds of the response, not changes in the outcome itself as ordinary least square (OLS) regression does.Logistic regression has various analogies to OLS regression: logit coefficients represent to be coefficients in the logistic regression equation, the normalized logit coefficients correspond to beta weights, and a pseudo R square (R 2 ) statistic is accessible to sum up the quality of the relationship.In contrast to OLS regression, not withstanding, logistic regression does not expect linearity of relationship between the independent variables and the response, it is not necessary that the variables are normally distributed, does not presume homoscedasticity, and all in all has less severe prerequisites.It does, nonetheless, necessitate that observations be independent and that the independent variables be linearly related to the logit of the dependent (Abdalla, 2012: 272) 537 understudies whose moms are utilized indicating acceptable scholastic execution than their partners whose moms don't seem to be waged. In 2016, Erkan intends to take a gander at the association between youngsters' work status and their segment attributes utilizing multinomial logistic regression.to the current end, data collected by TUIK's (Turkish Statistical Institute) "Child labour Survey, 2012," directed with the interest of 27,118 children, were used.At the primary stage of the analysis, eight independent variables on the demographic characteristics of the participants were examined using the chi-square test of independence, the variable that wasn't significant was removed, and therefore the subsequent analyses were directed using the remaining seven variables.The legitimacy of the model inside the investigation was examined utilize maximum likelihood estimation, and also the model was significant.Odds ratios of the variables within the model were calculated, and two category comparisons were made on the idea of the baseline category using odds ratio coefficients.In comparisons 1 and a pair of, odds ratio coefficients for the variables rustic/metropolitan, sex, age group, family unit size, proficiency, school participation, and level of instruction of the zenith of family unit were significant. In 2016 Erkan and Aydin focused on factors influencing the kinds of force against women decided by multinomial logistic regression model.during this specific situation, they utilized the info of "Research on force against Women in Turkey" that was applied by Turkish Statistical Institute in 2008.within the study, the variable of the kinds of force against women was used as variable that has four levels.moreover, twelve independent variables were used removing irrelevant variables from the info set via chisquare test of independence.After that, the most likelihood estimates and therefore the odds ratios of the variables of the model were obtained.Also, the legitimacy of the model was tested by likelihood ratio test.Finally, comparisons were made for 3 categories counting on the chances ratio in step with the chosen reference category.Regarding of odds ratios, the variables of "education level of woman" and "husband's work sector" were statistically significant in precisely comparison one; the variables of "agnation with husband, "education level of husband", "recurrence of seeing alcoholic husband", and "frequency of gambling of husband" were statistically significant in both comparison one and three; the variables of 538 "region", "deceived by husband", "common-law female for husband" were statistically significant all told comparisons. In 2015, Coughenour, Paz, de la Fuente-Mella and Singh, their purpose of this investigation was to grasp perceptions and likelihood of using various bicycle infrastructures for transportation by Las Vegas residents.An overview was created and administered (n = 457).Multinomial regression was accustomed create predictions to work out which foundations were seen as protected and perhaps to be utilized for transportation; frequencies were analyzed.so as to extend active transportation rates effectively, residents' perceptions of safety and infrastructure preferences should be considered.Results from this examination indicated that respondents had numerous security concerns with this bicycling framework in Las Vegas and gave thoughts to future foundation speculations and related strategies. In 2015 Murata, Fujii and Naitoh, , their point of this examination was to investigate the effectiveness of behavioral evaluation measures for predicting drivers' subjective drowsiness.Behavioral measures included neck vending angle (horizontal and vertical), back pressure, foot pressure, COP (Center of Pressure) movement on sitting surface, and tracking error in driving simulator task.Sluggish states were anticipated by methods for the multinomial logistic regression model where physiological and behavioral measures and subjective evaluation of drowsiness corresponded to independent variables and a variable, respectively.First, they thought about the adequacy of two techniques (correlation coefficient-based method and odds ratio-based method for deciding the request for entering behavioral measures into the expectation model it had been discovered that the expectation precision didn't contrast between the two techniques.Second, the prediction accuracy was compared among the numbers of behavioral measures.The forecast precision neglected to contrast among 4, 5, and 6 behavioral measures, and it had been reasoned that entering at least four behavioral measures into the expectation model is sufficient to acknowledge higher expectation exactness.Third, the forecast accuracy was compared between the strongly drowsy and therefore the weakly drowsy group.The expectation exactness varied between the two groups, and hence the proposed strategy was powerful (the prediction among variables regarding the eagerness to substitute one location for one more location.The targets of the investigation are: 1. to establish and predict the extent to which saltwater anglers were willing to substitute fishing at one location for fishing at another location; and 2. identify the link between independent variables like demographic characteristics, constraints, and anglers' specialization variables as predictors and anglers' willingness to substitute one fishing location for an additional.From the results of the multinomial logistic regression analysis, anglers' willingness to substitute was affected negatively by age, and affected positively by a restriction variable; and anglers' willingness to substitute was negatively related to specialization variables.

Multinomial logistic regression model: 3.1. The logit (logistic) regression model:
In fact, the multinomial logistic regression (MLR) model is a fairly straight forward generalization of the binary model, and both models depend mainly on (logistic analysis) or logistic regression.Logistic regression in different ways is the natural complement of ordinary linear regression whenever the response is in a categorical variable.When some discrete variables occur between explanatory variables, they are dealt with by the introduction of one or several (0, 1) dummy variable, but when the response variable belong to this kind of data, the multiple regression model For a response variable Y with two measurement levels (dichotomous) and explanatory variable X, let: the logistic regression model has linear form for logit of this probability: , Where the odds= ,… The odds= , and the logarithm of the odds is called logit, so The logit has linear approximation relationship and logit = logarithm of the odds.The parameter ß is determined by the rate of increase or decrease of the S shaped curve of p (x).The sign of ß indicates whether curve ascends (ß> 0) or descends (ß < 0), and the rate of [Chatterjee and Hadi,2006: 318-319]

Multiple logistic regressions:
The logistic regression can be extending to models with multiple explanatory variables.Let k denotes number of predictors for a binary response Y by x 1 ,x 2 ,..., x k , the model for log odds is: …(4) and the alternative formula, directly specifying , is The parameter refers to the effect of on the log odds that , controlling other , for instance, ) is the multiplicative effect on the odds of a one unit increase in , at fixed levels of other .
If we have n independent observations with p-explanatory variables, and the qualitative responses variable have k categories, to build the logit in the multinomial cases, one of the categories must be considered the basic level and all logits are built relative for it and all categories can be taken as the basic level, so we will take category k as the base level.Since there are not ordering, it is apparent that any category may be labeled k.

Baseline-Category Logit Model:
In MLR model, the estimate for the parameter can be identified compared to a(reference) baseline category.We defined bold letter as matrix or vector, let at a fixed setting x for explanatory variables, with , for observations at that setting, we search the counts at the J categories of Y as multinomial with probabilities, { }, logit models pair each response category with a baseline category, often the most popular model is: ,… (8) where j = 1,…, ( J -1), simultaneously simultaneously characterize impact of x on these (J-1)logits, the different effective according to responses paired with baseline, these (J-1) equations determine parameters to sign in with another pairs of response categories.Since … (9) with categorical predictors, Pearson chi-square statistic X 2 and the likelihood ratio chi-square statistic G 2 goodness-of-fit statistics provide a model check when data are not sparse.
When an explanatory variable is continuous or the data are sparse, such statistics are still valid for comparing nested models differing by relatively few terms (Agresti, 2002: 267-268).

The information criterion: 1. Akaike information criterion (AIC):
Akaike's information criterion (AIC) compares the quality of a set of statistical models to each other.For example, you might be interested in 542 what variables contribute to the status of low socioeconomic and how the variables contribute to that status.if you create several regression models for different reasons like education, family members, or disability status; The AIC will take each model , rank them from best to worst.The "best" rank will be the one that neither under-fits nor over-fits.Although the AIC will chooses the best model from a set, it won't say anything about absolute quality.In other meaning, if all of your models are poor, it will choose the best of a bad bunch.Thus if you selected the best model it had considered a running a hypothesis test to detect the relationship between the variables in next model to get perfective result.Akanke's Information Criterion is usually calculated with software.Thebasic formula is defined as:

AIC = -2(log-likelihood) + 2K or AIC= 2K-2ln (L)
…( 10) Where:  K is the number of model parameters (the number of variables in the model plus the intercept). Log-likelihood is a measure of model fit .This is usually obtained from statistical output.For small sample sizes (n/K < ≈ 40), use the second-order AIC:

.2: Bayesian information criterion (BIC)
In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; the model with the lowest BIC is preferred.It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).
When fitting models, it is possible to increase the likelihood by adding parameters, but doing so may result in overfitting.Both BIC and AIC attempt to resolve this problem by introducing a penalty term for the number of parameters in the model; the penalty term is larger in BIC than

3. Maximum likelihood estimation (MLE):
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable.The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate.The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference (Chambers, Steel, Wang & Welsh, 2012: 17), (Hendry & Nielsen, 2007: 36-37), (Ward & Ahlquist, 2018: 9).
If the likelihood function is differentiable, the derivative test for determining maxima can be applied.In some cases, the first-order conditions of the likelihood function can be solved explicitly; for instance, the ordinary least squares estimator maximizes the likelihood of the linear regression model.(Press, Flannery, Teukolsky and Vetterling, 1992: 651-653) Under most circumstances, however, numerical methods will be necessary to find the maximum of the likelihood function.

3. 1. The Likelihood Function:
Maximum likelihood estimation endeavors to find the most "likely" values of distribution parameters for a set of data by maximizing the value of what is called the "likelihood function" This likelihood function is largely based on the probability density function (pdf) for a given distribution.As an example, consider a generic pdf: … 544 where x represents the data (times to failure), and θ 1 , θ 2 ,..., θ k are the parameters to be estimated.For a two-parameter Weibull distribution, for example, these would be beta (β) and eta (η).For complete data, the likelihood function is a product of the pdf functions, with one element for each data point in the data set: … (14) where R is the number of failure data points in the complete data set, and x i is the i th failure time.It is often mathematically easier to manipulate this function by first taking the logarithm of it.This log-likelihood function then has the form: …( 15) It then remains to find the values for the parameters that result in the highest value for this function.This is most commonly done by taking the partial derivative of the log-linear equation for each parameter and setting it equal to zero: … (16) This results in a number of equations with an equal number of unknowns, which can be solved simultaneously.This can be a relatively simple matter if there are closed-form solutions for the partial derivatives.In situations where this is not the case, numerical techniques need to be employed.( 545 for two specific purpose: (a) To test the hypothesis of no association between two or more groups, population or criteria (i.e. to check independence between two variables); (b) and to test how likely the observed distribution of data fits with the distribution that is expected (i.e., to test the goodness-of-fit).It is used to analyze categorical data (e.g.male or female patients, smokers and non-smokers, etc.), it is not meant to analyze parametric or continuous data (e.g., height measured in centimeters or weight measured in kg, etc.).
The formula for calculating a Chi-square statistic is:

The Data Description
The data that we used in this research is a heart disease data that contains nine variables such as (Chest Pain, Age, Sex, Cholesterol, Fasting Blood Sugar, Thalac "Maximum Heart Rate"), Exercise, Oldpeak (ST Segment Depression induced by Exercise relative to rest and Blood Pressure) where the Chest Pain is the Response variable and the eight other variables are explanatory variables.We took this data from this site https://www.kaggle.com/ronitf/heart-disease-uciBy making some changes in some variables such as Chest Pain and Blood Pressure we changed them from a scale data type to classified nominal data type and the data analyzed in SPSS 26 program.We used Multinomial Logistic Regression for data analysis by taking the chest pain as a Response variable and we have four level in the chest pain variables we took all of the reference categories so we can see the differences between all of the categories.
Ward and Ahlquist, 2018: 21-22) 4. 4. The Chi-square test: The logic of hypothesis testing was first invented by Karl Pearson (1857-1936), Pearson's Chi-square distribution and the Chi-square test also known as test for goodness-of-fit and test of independence are his most important contribution to the modern theory of statistics.(Magnello, 2005-2006: 1) The importance of Pearson's Chi-square distribution was that, the statisticians could use the statistical methods that did not depend on the normal distribution to interpret the findings.He invented the Chi-square distribution to mainly cater the needs of biologists, economists, and psychologists.His paper in 1900 published in Philosophical magazine elaborates the invention of Chi-square distribution and goodness of fit test.(Rana and Singhal, 2015: 69) Chi-square test is a nonparametric test used ( ‫المجلذ‬ ‫واالقخصاديت/‬ ‫اإلداريت‬ ‫للعلوم‬ ‫حكريج‬ ‫مجلت‬ ‫واالقخصاد/‬ ‫اإلدارة‬ ‫كليت‬ ‫حكريج/


The Goodness of Fit table contains the Pearson and Deviance Chi-Square tests, which are useful for determining whether a model exhibits good fit to the data.Non-significant test results are indicators that the model fits the data well.Both of the Two different tests indicates that the model fits the data well since Pearson's Chi-Square test is and Deviance's Chi-Square test is  These are Pseudo R-Square values that are treated as rough analogues to the R-Square value in OLS Regression.In general, there is no strong guidance in the literature on how these should be used or interpreted. These results contain Likelihood Ratio Tests of the overall contribution of each independent variable to the model.Using the conventional threshold, we see that each of the (Thalac (Maximum Heart Rate), Exercise and Old peak) predictors in the model are Significance since their values are greater than the . The results in table (6) provide information comparing each chest pain level against the reference category.The first set of coefficients represents

( ‫المجلذ‬ ‫واالقخصاديت/‬ ‫اإلداريت‬ ‫للعلوم‬ ‫حكريج‬ ‫مجلت‬ ‫واالقخصاد/‬ ‫اإلدارة‬ ‫كليت‬ ‫حكريج/‬ ‫جامعت‬ 11 ( ‫العذد‬ ) 33 ‫ج‬ ) 2 / 2021
In 2014, Madhu, Ashok and Balasu bramanian, their study was: a.Is there a difference within the pattern of carcinoma cases in numerous socioeconomic status with relevancy their zone of residence.b.Demonstrate the appliance of multinomial logistic multivariate analysis to look at the factors related to carcinoma in major league salary, middle and low pay families.Carcinoma cases reported to the Bharath Hospital and Institute of Oncology (BHIO) from 2007 to December 2011 were analyzed.Descriptive analysis like chi-square analysis and multinomial multivariate analysis is performed.MLR analysis demonstrated that Illiteracy, nulliparity, young ladies (< 40 years) having a place with family units had higher chances of carcinoma in middle and low income families contrasted with major league salary families. In 2007, Woo-Yong and Ditton studied about exploring the connections

Table ( 1
): Represents the Summary of the Data

Table ( 2
): Represents the Information about the model Fitting

Table ( 5
): Represents the Likelihood Ratio Tests for all of the Predictors in the Model ( ‫المجلذ‬ ‫واالقخصاديت/‬

Table ( 10
): Represents the Classification Statistics of the Predictors by the Model

. The Results of the Data:
 The Model Fitting Information table contains a Likelihood Ratio Chi-Square test, comparing the Full Model (i.e., containing all the predictors) with the Null Model (i.e., containing intercept only or no predictors).Statistical significance indicates that the Full Model represents a significant improvement fit over the Null Model.We can see that this model is a significant improvement in fit over a null model since the .