Statistical Analysis of Covid-19 ( SARS-Cov-2 ) Patients Data of Karnataka , India


 Cases of coronavirus disease 2019 (Covid-19) in India is increasing day by day. Severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) is a new virus of coronavirus family therefore not much information available, thus making it very difficult task to make medicine or vaccine for this virus as early as possible. So, it is very important to analyse the data and find meaningful insight in data so graph of cases that is increasing day by day can be flatten out. For current study, Karnataka state data has taken and Chi square test is performed to find relationship between gender (male and female), age group (less than 18, 19 to 40, 41 to 65 and greater than 65) and current status (recovered, hospitalized and deceased). Our results show that gender is independent of current status and age group is dependent upon current status and age group and gender relationship is also dependent.


Introduction
Currently, solving any problem data plays a very important role. In industry, to reduce cost and maximize pro t, data analysis is very useful. Covid-19 cases are increasing daily. This virus is a new type of virus, so little information is available related to this virus, thus making it very di cult to make medicine or vaccine for this virus as early as possible. At the time of writing this research paper, cases of Covid-19 are reaching nearly 15.01 million worldwide [1]. In India, the total number of cases is approximately 1.2 million, out of which 440k cases are still active [2]. The number of positive cases in a day is also increasing rapidly. Therefore, it is very important to analyse these data and nd meaningful insight into the data so that graphs of cases that are increasing daily can be attened out.

Literature Review
In [3], researchers provide data analysis and prediction for Covid-19 cases all over the world. They collected data from DingXiangYuan, John Hopkin University and WHO. These data are uploaded on the CoronaTracker [4] website. For prediction, the Susceptible-Exposed-Infected-Removed (SEIR) model was used. They also provide sentiment analysis of news on Covid-19, and they found 561 positive articles and 2548 negative articles.
In [5] this analysis, researchers provided an effect of comorbidity on Covid-19 patients. They analysed 1590 con rmed cases in China hospitalized in different hospitals. A total of 686 female patients, 399 patients had comorbidities. In this research, they found that comorbidity plays a crucial role in clinical treatment, and patients with comorbidities have poor clinical outcomes. Another study [6] showed that the most common symptoms of covid-19 were fever, cough, expectoration, headache and myalgia or fatigue.
[7] performed a clinical prediction of mortality of covid-19 based on 150 patients in Wuhan city, China. Of these 150 cases, 68 and 82 were deaths and discharges, respectively. In this study, they found that there is a signi cant difference between age in death cases and discharge cases. Forty-three out of 68 deaths had comorbidities, and in discharge cases, 34 out of 82 had comorbidities. Sixty-three patients died due to respiratory failure or myocardial damage. Only 5 patients died without any known cause.

Data Analysis
In this analysis, we have taken Covid-19 data from Karnataka, India. The dataset for this study was downloaded from Kaggle [8]. In this dataset, a list of covid-19 cases of each state is provided, but most of the attribute values were missing. For this study, we are mainly focused on three attributes: gender, age and current status (recovered, hospitalized and deceased). This dataset has many missing values, and directly applying analysis on the dataset is not possible because it will not provide accurate results and a high chance of biasedness.
Therefore, we rst perform data preprocessing. In this step, we check for missing values based on state and check that there is very missing value for some particular time interval. Based on these two conditions, we select Karnataka. After selecting the target data, we thoroughly analysed the data and nalized our research question.
Research Question 1. Is there any relationship between gender and patient status? 2. Is there any relationship between patient age and patient status?
3. Is there any relationship between patient age and patient gender?
For this analysis, we used IBM SPSS[9] software.

Dataset
The dataset has been taken from Kaggle. [8] There are a total of 17 attributes in the dataset.
Except for age, all attribute data types are strings.
In SPSS, we cannot perform any type of analysis on the string datatype. Therefore, we replace the value of gender, transmission type and current status with nominal data. Age data are available in integer format, but the value of age ranges between 0 and 100, so it is very di cult to visualize such data. We also divided this attribute into categories and made a new age attribute.      Table 5 shows statistics after removing all the missing values. Fig. 1. Shows the pie chart of male and female cases. Table 6 provides information related to current status attributes. There are no missing value attributes. Fig. 2. Bar chart for current status and it can be clearly seen from the bar chart that the majority of cases are hospitalized.
Valid and missing values in the current status Current Status

N Valid 510
Missing 0 Table 7 provides details of the age value in the dataset. Fig. 3 shows the histogram, mean age and standard deviation for age. Fig. 4 and Fig. 5 show histograms for males and females, respectively. Cases according to age group, current status and gender are represented in graphical form in Fig. 6.
To solve the research question, we perform a chi square test. This test is used when we are dealing with nominal or ordinal data and want to nd the relationship between two variables.

Chi Square Test
In the chi square test, we assume two hypotheses, the null hypothesis (h 0 ) and the alternate hypothesis (h a ).
Null hypothesis (h 0 ): there is no relationship between variables.
Alternate hypothesis (h a ): There is a signi cant relationship between variables.
If the p value (asymptotic signi cance) is less than .05, then we reject our null hypothesis, and if the value is greater than .05, then we cannot reject our null hypothesis.

RQ. 1. Is there any relationship between gender and patient status?
A chi-square test was performed to determine the relationship between gender and current status. In this test, we want to check whether gender (male or female) has any dependencies on current status (recovered, hospitalized and deceased) and vice versa. Table 8 gives a cross tabulation of gender and current status, and Fig. 7 represents the graphical representation. In Table 11, the Chi square value is calculated, and it is .494, which is much higher than .05, so we cannot reject our null hypothesis. We can say that there is no effect of gender on the current status of the patient and vice versa. In other words, current status does not depend upon whether a patient is male or female. Similar to the RQ1 Chi square test, the relationship between age group and current status was also determined. In this test, we want to check whether the age group has any dependencies on current status and vice versa. Table 9 gives the cross tabulation of age group and current status, and Fig. 8 represents the graphical representation. In Table 11. The chi square value is calculated, and it is .000, which is less than .05, so we reject our null hypothesis. We can say that there is an effect of age group on the current status of the patient and vice versa. RQ. 3. Is there any relationship between age group and gender? A chi-square test was also performed to determine the relationship between age group and gender. In this test, we want to check whether the age group has any dependencies on gender and vice versa. Table 10 gives the cross tabulation of age group and gender, and Fig. 9 represents the graphical representation. In Table 11. The chi square value is calculated, and it is .007, which is less than .05, so we reject our null hypothesis. We can say that there is an effect of age group on the gender of the patient and vice versa.  Bar Chart for Current Status Histogram for age Histogram for age w.r.t. male Histogram for age w.r.t. female Bar Graph representing Male and female in different age group with current status Bar Chart for age group w.r.t. gender