Factor analysis approach to classify COVID-19 datasets in several regions

The aim of this research is to investigate the relationships between the counts of cases with Covid-19 and the deaths due to it in seven countries that are severely affected from this pandemic disease. First, the Pearson’s correlation is used to determine the relationships among these countries. Then, the factor analysis is applied to categorize these countries based on their relationships.


Introduction
In the winter months of 2019-2020, another type of coronavirus, Covid-19, has been reported in Wuhan [1]. This virus has severe destructive effects on the respiratory system. From January to now (April 18, 2020), this epidemic has become epidemic all over the world and day by day the cases with Covid-19 and the deaths due to Covid-19 are extremely increasing in most of countries [2][3][4][5][6][7][8][9][10][11][12][13][14][15]. There are many techniques analyze the natural phenomena including artificial intelligence, mathematical and statistical methods such as optimization, deep learning, time series analysis, machine learning, regression modeling, clustering and numerical analysis . Since Covid-19 has many impacts on environment, health, society and economy, the study of the rate of spread of this disease and the comparison of its rate in different countries is essential. There are some researches about the classification of Covid-19 datasets [40][41][42][43]. These researches are based on time series analysis, principal component analysis and fuzzy clustering.
The aim of this research is to study the relationships between the counts of the cases with Covid-19 and the deaths due to it in seven countries that are severely affected from this pandemic disease. First, the coefficients of correlation are computed to determine the relationships between these countries. Then, the factor analysis is applied to categorize these countries using the counts of cases and deaths.

Material and method
This section is devoted to study the research's dataset and to and to introduce the factor analysis.

Dataset
In this work, the counts of the cases with Covid-19 and the deaths due to it in United States America, United Kingdom, Spain, Italy, Iran, Germany, and France from February 22 to April 18 of 2020, are considered [43,44]. Table 1 summarizes the descriptive statistics of dataset containing the mean and the standard deviation. It can be observed that Iran and United States America have the minimum and the maximum counts of the cases with Covid-19. In addition, Germany and United States America have the minimum and the maximum of the deaths due to Covid-19. The plots for the counts of the cases with Covid-19 and the deaths due to it are also demonstrated in Fig. 1.
The relationships between the rates of the spread of Covid-19 among these countries have been studied using Pearson's coefficient of correlation. As it can be seen in Table 2, all of the values are more than 0.5 and significant, and consequently there are strong positive relationships between the rates of spread of Covid-19 in all of countries.

Principles of factor analysis
Factor analysis (FA) as a popular multivariate statistical technique transforms some dependent features into some other features called factors such that the first factors of this transformation have the main information of the first dataset [45]. In other words, FA is used to convert a dataset with high dimensions to a dataset with lower dimensions, by considering minimum factors such that the dimension of the converted dataset is decreased. FA focuses on the correlations of variables such that the variables in a factor are highly correlated with each other and the variables in different factors are highly uncorrelated with each other. In applications, the number of the main factors in FA is usually considered as the number of the eigen-values of the correlation's matrix with the values larger than one. To investigate the suitability of FA, the Kaiser-Meyer-Olkin (KMO) index is used. The KMO greater than 0.8 verifies the accuracy of FA.
as the mean vector and covariance matrix of X.
The equation of factor analysis with m factors (m ≤ p) is presented by The matrix L and the vectors F and Ψ are called the factor loading matrix, the factors and errors, respectively.
This model can be rewritten by such that l ij is named as the loading of X i on the factor F j .
In orthogonal factor analysis, we have Consequently, ∑ m j=1 l 2 ij is determines the proportion of Var(X i ) that can be explained by the factors F 1 , …, F m .
The main aim of factor analysis is to find the values of the loadings. To compute the matrices L and Ψ, different approaches such as principal component and maximum likelihood can be applied. Principal component approach uses eigen-values and eigen-vectors to decompose the matrix Σ to find the matrix L. Maximum likelihood approach computes the likelihood and then optimize it to find the matrices L and Ψ.
When the loading values are estimated, we can consider loading plots. Loading plots can be used to ⁶ Study the correlations between variables ⁶ Categorize and Classify the variables ⁶ Detect the number of factors In loading plot, the angle between two variables (θ) determines the correlation (r) between them. θ = 90 • verifies that two variables are uncorrelated (r = 0). The cases θ = 0 • and θ = 180 • refer to exact positive and negative linear relationship, respectively.

Results
This section reports the results of FA approach to classify the countries based on research's variables. It should be noted that the number of the main factors in FA was considered as the number of the eigen-values of the correlation's matrix with the values larger than one. Moreover, the KMO values were more than 0.8 that verify the accuracy of FA Table 1 The mean and standard deviation for the counts of the cases with Covid-19 and the deaths due to this pandemic disease. approach.

Counts of cases with Covid-19
The results of FA technique to categorize the research countries, in basis of the counts of the cases with Covid-19, are provided in Fig. 2. The outputs demonstrate the statistical differences between the relationships among the countries and we can categorize the countries into following classes: First class: Iran, France, Spain, Germany, Italy. Second class: United Kingdom, United States America.

Counts of the deaths due to Covid-19
The results of FA technique to categorize the research countries, in basis of the counts of the deaths due to Covid-19, are provided in Fig. 3. The outputs demonstrate the statistical differences between the relationships among the countries and we can categorize the countries into following classes: First class: France, United Kingdom, Germany and United States America.
Second class: Iran, Italy and Spain.

Cumulative counts of the cases with Covid-19
The results of FA technique to categorize the research countries, in basis of the cumulative counts of the cases with Covid-19, are provided in Fig. 4. The outputs demonstrate the statistical differences between the relationships among the countries and we can categorize the countries into following classes: First class: France, Spain, Germany, Iran and Italy. Second class: United Kingdom and United States America.

Cumulative counts of the deaths due to Covid-19
The results of FA technique to categorize the research countries, in basis of the cumulative counts of the deaths due to Covid-19, are provided in Fig. 5. The outputs demonstrate the statistical differences between the relationships among the countries and we can categorize the countries into following classes: First class: France, United Kingdom, Germany and United States America.
Second class: Iran, Italy and Spain.

Conclusion
Since Covid-19 has many impacts on environment, health, society and economy, the study of the rate of spread of this disease and the comparison of its rate in different countries is essential. The aim of this research was to study the cases with Covid-19 and the deaths due to this pandemic disease in seven countries that are severely affected from this pandemic disease. The cases and the deaths in United States America, United Kingdom, Spain, Italy, Iran, Germany, and France from February 22 to April 18 of 2020, were considered. First, the coefficients of correlation were computed to determine the relationships among these countries. The outputs showed that there were strong positive relationships between the rates of spread in all of countries. Then, the factor analysis was applied to categorize the countries in basis of the  counts and deaths. For the cases with Covid-19, United Kingdom and United States America were similarly distributed to each other and were differently distributed from other countries. Also, for the deaths, Iran, Italy and Spain were similarly distributed to each other and were differently distributed from other countries. For future works, the authors suggest classifying the Covid-19 datasets of more regions based on FA technique, or apply this technique to classify the regions for other epidemic or pandemic diseases.