Sociodemographic and clinical characteristics of hospital admissions for COVID-19: A retrospective cohort of patients in two hospitals in the south of Brazil [version 1; peer review: 1 approved with reservations]

Background : This database aims to present the sociodemographic and clinical profile of a cohort of 799 patients hospitalized with coronavirus disease 2019 (COVID-19) in two hospitals in southern Brazil. Methods : Data were collected, retrospectively, from November 2020 to January 2021, from the medical records of all hospital admissions that occurred from 1 April 2020 to 31 December 2020. The analysis of these data can contribute to the definition of the clinical and sociodemographic profile of patients with COVID-19. Data description: This dataset covers 799 patients hospitalized for COVID-19, characterized by the following sociodemographic variables: sex, age group, race, marital status and paid work. The sex variable

sepsis; influenza exam results. Other health problems: diabetes,

Introduction
The coronavirus disease 2019 (COVID-19) pandemic has been considered the greatest challenge of the present time, associated with the unprecedented crisis in the health area, due to the expressive demand for hospital beds by patients with severe coronavirus conditions, which resulted in the collapse of health systems worldwide. [1][2][3] Patients affected by COVID-19 have shown clinical and sociodemographic variations, with a mortality rate around 2% in cases where there is massive alveolar damage and progressive respiratory failure. [4][5][6][7] Sex and gender variables also influenced COVID-19 epidemiology. 8 Its lethality varies, above all, according to age group, clinical conditions and pre-existing comorbidities, such as arterial hypertension, diabetes, previous pulmonary disease, cardiovascular disease, cerebrovascular disease, immunosuppression and cancer. [9][10][11] Although there are disparities with regard to clinical variables and comorbidities associated with increased risk of hospitalization and mortality from COVID-19, growing evidence shows that patients with pre-existing diseases, and advanced age, are especially at risk of death due to viral infection. [12][13][14][15] Therefore, future analyses of this database can contribute to the analysis of characteristics of hospital admissions of patients affected by COVID-19. This database contains relevant information on the sociodemographic and clinical characteristics of patients hospitalized by COVID-19. The publication of the database promotes open science, the integrity and quality of scientific production and the reuse of data. Research design and method of data collection This database comes from a cohort of patients who were admitted with a diagnosis of COVID-19 in two hospitals in southern Brazil. Retrospectively, from November 2020 to January 2021, data were collected from medical records of all hospital admissions that occurred from 1 April 2020 to 31 December 2020. Data related to the sex of the patients refer to the biological characteristics at birth from the patients' medical records. All patients aged 18 years or older were included. This dataset covers 799 hospitalized patients.

Ethical approval and consent to participate
Questionnaires hosted in the Survey Monkey platform were used, which contained questions about sociodemographic data, health conditions, and clinical, therapeutic, and outcome data. The variables considered for this study were: sex, age, age group, race, marital status, years of education, number of hospitalizations, hospitalization units, length of hospitalizations, risk classification, whether a COVID-19 test was taken, test used to detect COVID-19, respiratory compromise, ventilatory pattern, evolution, and previous diseases.
The inclusion criteria were: hospital admissions with a medical diagnosis of COVID-19; and being 18 years old or older. Individuals under 18 years of age and those who were not hospitalized due to COVID-19 were excluded.

Data description
Data were characterized by the following variables: sex, age group, race, marital status and paid work. The following clinical variables are included: admission to clinical ward, hospitalization in the Intensive Care Unit (ICU), COVID-19 diagnosis, number of times hospitalized by COVID, hospitalization time in days and risk classification protocol (green, yellow, red and not informed). Other clinical variables included: pulmonary impairment: <50%, between 50 and 75% or >75%; patients ventilation pattern (presented dyspnea with respiratory effort, dyspnea without effort or without dyspnea); high-flow oxygen mask; pulmonary thromboembolism (PE); cardiovascular disease; pulmonary sepsis; influenza exam results. Other health problems (if yes or no): diabetes, systemic arterial hypertension, chronic obstructive pulmonary disease (COPD), obesity, tabaco smoking, asthma, chronic kidney disease, overweight, vascular accident (Stroke), sedentary lifestyle, human immunodeficiency virus (HIV/AIDS), cancer, Alzheimer's disease, Parkinson's disease. The description of these characteristics is provided in Table 1. The analysis and reuse of sociodemographic and clinical profile data can be performed using descriptive statistics and measures of central tendency (mean and median) and variability (standard deviation and interquartile range), as well as absolute and relative distributions (n-%). The symmetry of the continuous distribution can be assessed using the Kolmogorov-Smirnov test. The predictive power of the variables can be analyzed using logistic regression. The opening of data from research projects is one of the most important elements of the research lifecycle for the success of Open Science. This is a sine qua non for reproducibility and scientific progress. Open Data speeds up the research process, facilitates reuse and enriches data sets, in addition to optimizing the application of public resources, in other words, enabling more use of the same investment. Opening data also allows detecting false, biased and inaccurate conclusions, as they are subject to replicability tests. Thus, great social impact is demonstrated when databases are published. 16

Dataset validation Limitations
This dataset is limited to a retrospective cohort of patients from two hospitals in southern Brazil. This can be considered a limitation. However, the data are very relevant, as there are few published studies and databases available on COVID-19 in Brazil. Researchers interested in the sociodemographic and clinical profile of patients hospitalized for COVID-19 can extensively explore the variables described here.

Ethical considerations
The present study was approved by the Research Ethics Committee of the Federal University of Santa Catarina (UFSC), (opinion No. 4.323.917/2020) Santa Catarina, Brazil. The basis and necessary information about the study objectives and method were given to all participants before the commencement of the study, and written informed consent was obtained from them. Participants consented to data publication. Participants were assured of the confidentiality of data and that only general statistics would be presented. The information available can be accessed by researchers who are interested in better understanding health policies, planning health actions, and estimating costs and demands in emergency situations. Database studies of this nature are important as they contribute to improving the effectiveness of health promotion and prevention actions. The results of data analysis provide information to managers who can assess the situation and identify critical points and propose strategies in situations that require immediate solutions.
Review the inclusion and exclusion criteria. The author informs, "The inclusion criteria were: hospital admissions with a medical diagnosis of COVID-19; and be 18 years or older. Individuals under 18 years of age and those who were not hospitalized due to COVID-19 were excluded", so if being 18 years of age or older is an inclusion criterion and hospital admissions with a medical diagnosis of COVID-19, the exclusion criteria must be those within this population who were included, not being 18 years old and being hospitalized for COVID-19 was already a characteristic or circumstance that prevented the inclusion of the subject in the study.
In the variables, the inclusion criterion is having a diagnosis of COVID-19 and being hospitalized, the diagnostic variable for COVID-19 is not justified since everyone in the study has this outcome. Inform how the risk classification (green, yellow, red and not informed) was defined in the protocol in the study patients. Is it relevant that the author informs the distinction between obesity and overweight, would it not be possible to present this information in a single variable? It is not clear in the text how the authors defined a sedentary lifestyle. It is suggested that age in the bank be presented as a continuous variable, which allows the application of numerous statistical methods with better power for inference and modeling. As the authors inform that they collected data regarding income, the information in the database would be interesting.
As argued by the authors, "open data streamline the research process, facilitate reuse and enrich datasets, in addition to optimizing the application of public resources, that is, enabling greater use of the same investment", it would be recommended that the authors inform the geographical location of the two hospitals in southern Brazil, as well as better describing the population from which the sample originated.
I understand that this information described by the author "Researchers interested in the sociodemographic and clinical profile of patients hospitalized for COVID-19 can widely explore the variables described here", is not a limitation.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound? Partly