HOW TO SELECT APPROPRIATE STATISTICAL TEST IN SCIENTIFIC ARTICLES

Статистиката е дел од математиката која се занимава со собирање, анализирање, интерпретирање и презентирање маса (голем број примероци) нумерички податоци со цел да се извлечат релевантни заклучоци од истата. Статистиката е форма на математичка анализа која користи квантификувани модели, репрезентации и синопсиси за даден број експериментални податоци или истражувања кои се спроведуваат со жива материја. Студентите и младите истражувачи во биомедицинските науки како и во специјалната едукација и рехабилитација често го искажуваат своето мислење дека одбрале да се запишат на тие студии поради тоа што не поседуваат големо знаење или интерес за математика. Тоа е тажна изјава, но има и вистина во неа. Целта на овој едиторијал е да им послужи и да им помогне на младите истражувачи да ја одберат најсоодветната техника за статистичка обработка на податоците која ќе соодветствува на целите и условите на одредена анализа. Најважните статистички тестови ќе бидат прикажани во овој труд. Statistics is mathematical science dealing with the collection, analysis, interpretation, and presentation of masses of numerical data in order to draw relevant conclusions. Statistics is a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. The students and young researchers in biomedical sciences and in special education and rehabilitation often declare that they have chosen to enroll that study program because they have lack of knowledge or interest in mathematics. This is a sad statement, but there is much truth in it. The aim of this editorial is to help young researchers to select statistics or statistical techniques and statistical software appropriate for the purposes and conditions of a particular analysis. The most important statistical tests are reviewed in the article.

techniques and statistical software appropriate for the purposes and conditions of a particular analysis.In the following text it will be explained some of these steps.(6).

Interval scales
Interval scales are numerical scales including: age (years), weight (kg) or length of bone (cm), in which intervals have the same interpretation throughout.Interval data has a meaningful order and also has the quality that equal intervals between measurements represent equal changes in the quantity of whatever is being measured.But these types of data have no natural zero.Example is Celsius scale of temperature.In the Celsius scale, there is no natural zero, so we cannot say that 50°C is double than 25°C.In interval scale, zero point can be chosen arbitral.IQ test is also interval data as it has no natural zero (7).
is the amount of money you have in your pocket right now (500 denars, 1000 denars, etc.).Money is measured on a ratio scale because, in addition to having the properties of an interval scale, it has a true zero point: if you have zero money, this implies the absence of money.Since money has a true zero point, it makes sense to say that someone with 1000 denars has twice as much money as someone with 500 denars (or that Mark Zuckerberg has a million times more money than you do) (6).

Нормална дистрибуција или не
Normal distribution or not only be used when normal distribution is not followed.There are various methods for checking the normal distribution, some of them are plotting histogram, plotting box and whisker plot, plotting Q-Q plot, measuring skewness and kurtosis, using formal statistical test for normality (Kolmogorov-Smirnov test, Shapiro-Wilk test, etc).Formal statistical tests like Kolmogorov-Smirnov and Shapiro-Wilk are used frequently to check the distribution of data.All these tests are based on null hypothesis that data are taken from the population which follows the normal distribution.P value is determined to see the alpha error.If P value is less than 0.05, data is not following the normal distribution and nonparametric test should be used in that kind of data.If the sample size is less, chances of non-normal distribution are increased (7).

Parametric and non-parametric procedures
Parametric statistical procedures rely on assumptions about the shape of the distribution (assume a normal distribution) in the underlying population and about the form or parameters (means and standard deviations) of the assumed distribution.Nonparametric statistical procedures rely on no or few assumptions about the shape or parameters of the population distribution from which the sample was drawn (8).Nonparametric methods are typically less powerful and less flexible than their parametric counterparts.Parametric methods are preferred if the assumptions can be justified.Sometimes a transformation can be applied to the data to satisfy the assumptions, such as log transformation (9).Table 1 shows the use of parametric and non-parametric statistical methods.

Табела 1. Параметриски наспроти непараметриски методи/ Table 1. Parameteric vs non-parametric methods
Variance is a numerical value used to indicate how widely individuals in a group vary.If individual observations vary greatly from the group mean, the variance is big; and vice versa.It is important to distinguish between the variance of a population and the variance of a sample.They have different notation, and they are computed differently.The variance of a population is denoted by σ 2 ; and the variance of a sample, by s 2 .Standard deviation (SD): is a measure of spread (scatter) of a set of data.Unlike variance, which is expressed in squared units of measurement, the SD is expressed in the same units as the measurements of the original data.It is calculated from the deviations between each data value and the sample mean.It is the square root of the variance.For different purposes, n (the total number of values) or n-1 may be used in computing the variance/SD.If you have a SD calculated by dividing by n and want to convert it to a SD corresponding to a denominator of n-1, multiply the result by the square root of n/(n-1).If a distribution's SD is greater than its mean, the mean is inadequate as a representative measure of central tendency.For normally distributed data буција, приближно 68% од дистрибуцијата припаѓа ±1 SD од аритметичката средина, 95% од дистрибуцијата припаѓа на ± 2 SD од аритметичката средина, и 99.7% од дистрибуцијата припаѓа на ± 3 SD од аритметичката средина (емпириско правило).Стандардна грешка (SE) или како што и се нарекува стандардна грешка на аритметичката средина (SEM) е мерка која врши мерење за да се види до која мерка податоците добиени за примерокот кој се истражува се разликува од вистинската но непозната популациска аритметичка средина.Таа е стандардната девијација (SD) од случајно одбрани примероци при дистрибуцијата кај аритметичката средина (т.е.аритметичка средина на повеќе примероци од истата популација) како таква, таа ја мери прецизноста на статистиката која се употребува како проценка на одредена популација.Проценетата вредност SE/SEM зависи од големината на примерокот.Таа е поврзана со квадратниот корен од големината на примерокот: (проценето) SE = SD / (N) 1/2 .Вистинската вредност на SE може да биде пресметана само доколку SD на популацијата ни е позната.Кога се користи SD на примерокот (скоро секогаш), тоа претставува процена и би требало да се вика процена на стандардна грешка (ESE).Кога големината на примерокот е релативно голема (N ≥ 100), примерокот на SD ни овозможува добра поткрепена процена на SE (10).Статистичарите користат интервал на доверба со цел да го изразат степенот на несигурност кој е поврзан со статистиката на примерокот.Интервалот на доверба е процена на интервал комбинирана со изјава за веројатност.На пример, да претпоставиме дека статистичарот спроведува анкета и има пресметано процена на интервалот базирана на податоците добиени од анкетата.Статистичарот може да го користи нивото на доверба за да ја опише несигурноста која е асоцирана со процената на интервалот.Тој / таа може да ја опишат процената на интервалот како "95% интервал на доверба".Ова би значело дека доколку го користиме истиот метод за да селектираме други примероци и да пресметаме процена на интервал за секој од примероците, тогаш би можеле да очекуваме вистинскиот параметар на популацијата да се наоѓа во рамките на 95% од процената на интервалот во кое било дадено време.Интервалите на доверба се преферирани за да посочат кон процени и процени на итервали, values, approximately 68% of the distribution falls within ±1 SD of the mean, 95% of the distribution falls within ± 2 SDs of the mean, and 99.7% of the distribution falls within ± 3 SDs of the mean (empirical rule).Standard error (SE): or as commonly called the standard error of the mean (SEM) is a measure of the extent to which the sample mean deviates from the true but unknown population mean.It is the standard deviation (SD) of the random sampling distribution of means (i.e., means of multiple samples from the same population).As such, it measures the precision of the statistic as an estimate of a population.The (estimated) SE/SEM is dependent on the sample size.It is inversely related to the square root of the sample size: (estimated) SE = SD / (N) 1/2 .The true value of the SE can only be calculated if the SD of the population is known.When the sample SD is used (as almost always), it is an estimate and should be called estimated standard error (ESE).When the sample size is relatively large (N ≥ 100), the sample SD provides a reliable estimate of the SE (10).Statisticians use a confidence interval to express the degree of uncertainty associated with a sample statistic.A confidence interval is an interval estimate combined with a probability statement.For example, suppose a statistician conducted a survey and computed an interval estimate, based on survey data.The statistician might use a confidence level to describe uncertainty associated with the interval estimate.He/she might describe the interval estimate as a "95% confidence interval".This means that if we used the same sampling method to select different samples and computed an interval estimate for each sample, we would expect the true population parameter to fall within the interval estimates 95% of the time.Confidence intervals are preferred to point estimates and to interval estimates, because only confidence intervals бидејќи само интервалите на доверба ја покажуваат (а) прецизноста на процената и (б) несигурноста на процената (11) One-way ANOVA: is a technique used to compare means of three or more samples (using the F distribution).This technique can be used only for numerical data.The ANOVA tests the null hypothesis that samples in two or more groups are drawn from populations with the same mean values.To do this, two estimates are made of the population variance.These estimates rely on various assumptions.The ANOVA produces an F-statistic, the ratio of the variance calculated among the means to the variance within the samples.If the group means are drawn from populations with the same mean values, the variance between the group means should be lower than the variance of the samples, following the central limit theorem.A higher ratio therefore implies that the samples were drawn from populations with different mean values.Typically, the one-way ANOVA is used to test for differences among at least three groups, since the twogroup case can be covered by a t-test.When there are only two means to compare, the ttest and the F-test are equivalent; the relation between ANOVA and t is given by F = t 2 .An extension of one-way ANOVA is two-way analysis of variance that examines the influence of two different categorical independent variables on one dependent variable.Two-way ANOVA: a statistical test used to determine the effect of two nominal predictor variables on a continuous outcome variable.A two-way ANOVA test analyzes the effect of the independent variables on the expected outcome along with their relationship to the outcome itself.Random factors would be considered to have no statistical influence on a data set, while systematic factors would be considered to have statistical significance (11).
The correlation coefficient, r, can take any value between -1 and +1; 0 meaning no "linear" relationship (there may still be a strong non-linear relationship).It is the absolute value of r showing the strength of relationship.An associated P value can be computed for the statistical significance (a small P value does not necessarily mean a strong relationship).The square of the r is r 2 (r-squared or coefficient of determination) which corresponds to the variance explained by the correlated variable).Multiple regression: to quantify the relationship between several independent (explanatory) variables and a dependent (outcome) variable.The coefficients (a, b 1 to b i ) are estimated by the least squares method, which is equivalent to maximum likelihood estimation.A multiple regression model is built upon three major assumptions:  The response variable is normally distributed,  The residual variance does not vary for small and large fitted values (constant variance),  The observations (explanatory variables) are independent.
pressure, kind of treatment, disease stage etc).It can also be used when the outcome variable is polytomous (several categories of the prognosis; including ordinal response 'ordinal logistic regression' or 'proportional odds ratio model'), and when there are several outcome variables (multinomial logistic regression -a special class of loglinear models).Analysis of data from case-control studies via logistic regression can proceed in the same way as cohort studies (10) (10).

Power, P values, and percentages
Before undertaking a research study it is important to make a sample size calculation so that the study will have sufficient power to detect significant differences (13).Power analysis is directly related to tests of hypotheses.While conducting tests of hypotheses, the researcher can commit two types of errors: Type I error and Type II error.Statistical power mainly deals with Type II errors.It should be noted by the researcher that the larger the size of the sample, the easier it is for the researcher to achieve the 0.05 level of significance.If the sample is too small, however, then the investigator might commit a Type II error due to insufficient power.Power analysis is normally conducted before the data collection.The main purpose underlying power analysis is to help the researcher to determine the smallest sample size that is suitable to detect the effect of a given test at the desired level of significance.The reason for applying power analysis is that, ideally, the investigator desires a smaller sample because larger samples are often costlier than smaller samples.Smaller samples also optimize the significance testing (14).In addition to this it is often necessary to combine categories so that there are sufficient numbers in each group for comparison.For example, for the chisquare test to have valid results there needs to be an expected frequency in each cell of more than 5.When quoting percentages, it is essential to also quote the numerator and/or the denominator so that it is clear how the percentage has been calculated.Percentages based on small numbers such as less than 10 are not meaningful.A significance level (P value) is considered significant if it is less than 0.05.It is sufficient to quote P values to two decimal places if greater than 0.01.However, if the P value is very small then P < 0.0001 should be used (12).
Knowing how to choose right statistical test is an important asset and decision in the research data processing and in the writing of scientific papers.Young researchers and authors should know how to choose and how to use statistical methods.Nowadays, bigger journal publishers have statistical editor in their editorial office, which is not a case in most of the journals from low income countries (for example, Balkan countries).
There is need to know what type of data we may have, how are these data organized, how many sample/groups we have to deal with and if they are paired or unpaired; we have to ask ourselves if the data are drawn for a Gaussian on non-Gaussian population and, if the proper conditions are met, to choose an one-tailed test (versus the twotailed one, which is, usually, the recommended choice).Based on such kind of information, we may follow a proper statistical decision-tree, using an algorithmic manner able to lead us to the right test, without any mistakes during the test selection process.The competent researcher will need knowledge in statistical procedures.That might include an introductory statistics course, and it most certainly includes using a good statistics textbook.For this purpose, there is need course of Statistics to become mandatory (obligatory) for all students as it was in former curricula at the Faculty of Philosophy in Skopje.Young researchers have a need of additional courses in statistics.They need to train themselves to use statistical software on appropriate way.
Macedonian publishers of scientific journals shall make greater efforts to provide material-financial assets in order to employ a statistical editor.On that way, editorial offices can start counting on greater impact of their articles and greater citations which will be good prerequisite for obtaining an impact factor.

Table 3 .
Selecting statistical test for nominal variables (4) Test).It is inappropriate to use the t-test for multiple comparisons as a post hoc test.The t-test for independent samples tests whether or not two means are significantly different from each other but only if they were the only two samples taken.