Completeness of variables in Hospital-Based Cancer Registries for prostatic malignant neoplasm

ABSTRACT Objectives: to analyze the completeness of variables from Hospital-Based Cancer Registries of cases of prostate neoplasm in the Oncology Care Network of a Brazilian state between 2000 and 2020. Methods: an ecological time series study, based on secondary data on prostate cancer Hospital-Based Cancer Registries prostate. Data incompleteness was classified as excellent (<5%), good (between 5%-10%), fair (10%-20%), poor (20%-50%) and very poor (>50%), according to the percentage of lack of information. Results: there were 13,519 cases of prostate cancer in the Hospital-Based Cancer Registries analyzed. The variables “family history of cancer” (p<0.001), “alcoholism” (p<0.001), “smoking” (p<0.001), “TNM staging” (p<0.001) had a decreasing trend, while “clinical start of treatment” (p<0.001), “origin” (p=0.008) and “occupation” (p<0.001) indicated an increasing trend. Conclusions: most Hospital-Based Cancer Registries variables showed excellent completeness, but important variables had high percentages of incompleteness, such as TNM and clinical staging, in addition to alcoholism and smoking.


INTRODUCTION
Cancer is a term that covers more than a hundred malignant diseases that have in common uncontrolled cell growth, which can invade adjacent tissues or distant organs (1) , claiming the lives of around 9.3 million people annually (2)(3) .Specifically, prostate cancer is one of the most common cancers in the world, being one of the main causes of premature death in men (3)(4) .
In Brazil, the Brazilian National Cancer Institute (INCA -Instituto Nacional do Câncer) estimates that, for each year of the 2023-2025 triennium, there will be almost 72 thousand new cases of the disease, with an estimated risk of 67.86 new cases and a mortality rate of 13.7 deaths for every 100,000 men (1) .In the state of Espírito Santo, prostate cancer is the most common, representing 84.36 new cases for every 100,000 men, according to the latest INCA estimate (1) .
Risk factors are well established and include advanced age, ethnicity, genetic factors, family history of cancer and hormonal factors (1,3,(5)(6) , in addition to environmental factors, such as exposure to pesticides, which are still under investigation (7)(8)(9) .Although there is still little robust evidence for prostate cancer prevention (5) , it is possible to reduce the risk by reducing fatty foods, increasing the intake of vegetables and fruits and including physical activity in daily routines (1,5,10) .
Hospital-Based Cancer Registries (HBCR) are systematic sources of information, installed in general hospitals or specialized in oncology, with the aim of collecting data regarding diagnosis, treatment and evolution attended in these institutions (11) .HBCR provide assistance in collecting and processing information about cancer patients, up to the analysis and dissemination of the bases obtained through consultation of medical records, and, therefore, make a great contribution to Epidemiological Surveillance (12) .The information produced makes it possible to analyze the performance and quality of each institution in providing care to cancer patients as well as contributing to prognostic and survival studies (13) .They also contribute to individual patient care, as they ensure the followup of these patients (14)(15) .
A recent study by our group on HBCR of a single High Complexity Oncology Care Center (CACON -Centro de Assistência de Alta Complexidade em Oncologia) in the state of Espírito Santo showed that most of variables relating to prostate cancer cases, in the time series from 2000 to 2016, had excellent levels of completeness, but several clinical variables, important for a better understanding of the health-disease process, present a high number of missing data, highlighting the need for higher quality data (16) .However, an analysis of a more recent time series, that is, until 2020, encompassing the entire Espírito Santo Oncology Care Network, composed of a CACON and seven High Complexity Oncology Care Units (UNACON -Unidades de Assistência de Alta Complexidade em Oncologia), in order to direct Cancer Surveillance actions in the Espírito Santo territory regarding HBCR monitoring and assessment of hospitals in the State Oncological Care Network, has not yet been elucidated.

OBJECTIVES
To analyze the completeness of the HBCR variables of cases of prostate neoplasms in the Oncology Care Network of a Brazilian state between 2000 and 2020.

Ethical aspects
The study was approved by the Universidade Federal do Espírito Santo Health Sciences Center Research Ethics Committee (CEP-CCS-UFES).Patient consent was waived, as this was a retrospective research based on secondary data.Moreover, consent and authorization were obtained from the State Department of Health of Espírito Santo (SESA/ES), based in Vitória, capital, to collect secondary data and access restricted data from this research.

Study design, period and place
This is an ecological time series study according to STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) recommendations.The study was conducted using secondary data from the HBCR prostate cancer database in the state of Espírito Santo between 2000 and 2020.The secondary data were obtained from SESA/ES Cancer Surveillance and consolidated by INCA.
The Espírito Santo Oncology Care Network covers three health regions: Metropolitan Region; South region; and North/ Midwest region (15) .This Oncology Care Network is made up of a CACON represented by Hospital Santa Rita de Cássia, located in the capital, Vitória, as well as the seven UNACON authorized by the Ministry of Health (MoH): Hospital Evangélico de Cachoeiro de Itapemirim, located in the municipality of Cachoeiro from Itapemirim; Hospital Evangélico de Vila Velha, located in the city of Vila Velha; Hospital Universitário Antônio Cassiano de Moraes, Hospital Santa Casa de Misericórdia de Vitória and Hospital Estadual Infantil Nossa Senhora da Glória, located in the capital, Vitória; Hospital São José, located in Colatina; and Hospital Rio Doce, northern state, located in Linhares.All oncology hospital units in the state have HBCR structured and in operation, with their databases being sent annually to the Brazilian Cancer Hospital Registry Integrating System (SisRHC -Sistema Integrador do Registro Hospitalar de Câncer Brasileiro) (17)(18) .We emphasize that the Hospital Estadual Infantil Nossa Senhora da Glória's HBCR do not present data regarding diagnoses for prostate cancer.
Data were collected between February and June 2023 from SESA/ ES.We chose the period from 2000 to 2020 because it is a more recent period and because all the hospitals that make up the Oncology Care Network in the state of Espírito Santo had already sent, at the time of data collection, records of historical series that we proposed to analyze from the respective HBCR, which were processed and consolidated by the Espírito Santo Epidemiological Surveillance.

Population, inclusion and exclusion criteria
A total of 13,519 observations (registration of patients diagnosed with prostate cancer) were extracted from the HBCR database in the state of Espírito Santo via SESA/ES in the historical series studied, i.e., from 2000 to 2020, including all cases registered as analytical (whose planning and treatment are carried out in the hospital where registration took place) and non-analytical (those who arrive at the hospital already treated or who do not carry out the recommended treatment, mainly) (11) .

Study protocol
The epidemiological variables contained in the SisRHC tumor registry (11) and analyzed in the present study were: (1)  The HBCR tumor registry form is used to gather information from medical records, provide a case summary and as a data entry document to enter information into the SisRHC computerized databases (11) .The content of this form is defined based on the information needs of hospitals with a hospital cancer registry and follows the standardization guidelines recommended by the International Agency for Research on Cancer, validated by consensus by meetings coordinated by INCA (11) .
The definition of quality dimensions proposed by Lima et al. was used (2009) (19) , in which completeness is translated by the proportion of fields filled with non-zero values.Furthermore, as a reference for the analysis of completeness, we adopted the classification proposed by Romero and Cunha (2006) (20) .The percentage of missing data was classified as 1 -excellent (<5%), 2 -good (5-10%), 3 -fair (10-20%), 4 -poor (20-50%), or 5 -very poor (≥50%).Thus, the term "completeness" refers to the degree of completion of the analyzed field, measured by the proportion of reports with a field filled in with a different category from those that indicate absence of data.A field filled in the database with the category "ignored", the numeral zero, unknown date or term indicating absence of data was considered incomplete in this study.

Analysis of results, and statistics
For statistical analyses, the free software RStudio (version 2023.03.1) and R (version 4.2.2) were used.Completeness description was presented by the relative frequency observed and their respective completeness scores.The Friedman test (21) was used to compare score classifications between years, whereas the Mann-Kendall test (22)(23) assessed whether there was a statistically significant temporal trend between the years assessed.A statistical significance level of 0.05 was adopted.

RESULTS
During the study period, a total of 13,519 cases of prostate cancer recovered from HBCR in the state of Espírito Santo were recorded, as can be seen in Figure 1.
The variable "sex" was the only sociodemographic variable to present 100% completeness, followed by the variable "age", which presented 0.26% of missing data in 2016, and "origin", which had incompleteness ranging from 0.20% to 2.34% between 2012 and 2019, therefore, classified as excellent throughout the period studied.
The variable "place of birth" had an average incompleteness of 5.91% in the period, with emphasis on 2000, 2018 and 2020, which presented, respectively, 14.29%, 12.45% and 16.88% of data missing, being classified as fair.The variable "race/skin color" was classified as excellent or good in most of the years studied; however, in 2006 and 2007, it was classified as poor, with 31.67% and 23.78% of incompleteness, respectively."Marital status" was a variable with an excellent or good score in more than 90% of the years studied, with emphasis on 2012 and 2013, classified as poor, showing incompleteness of 11.08% and 11.48%, respectively.
The variable "education" obtained an excellent score from the years 2000 to 2004, with an average of 2.74% of missing data, however from 2005 to 2020 most years were classified as poor, with emphasis on the year 2010, where almost 50% of observations were missing.Similarly, the variable "occupation" presented an average of 2.20% incompleteness from 2000 to 2004, and from 2005, classified as poor in most of the following years, obtaining in 2018, 22.13% of missing data and classified as poor.Both variables "alcoholism" and "smoking" showed high rates of incompleteness, being classified as very poor and poor in most years of the 2000-2020 historical series studied.Table 1 presents details of year-by-year completeness classifications.
The variable "disease status at the end of first treatment in hospital", from 2000 to 2009, was classified as very poor, with an average of 72.11% of missing observations, but from 2010 onwards it presented better classifications, being poor or fair.and an average of 28.61% incompleteness.The variable "main reason for not carrying out antineoplastic treatment in hospital", its score varied from excellent to very poor in the period studied, with highlights for the year 2003, which presented incompleteness of just 0.44%, and for the year 2006, reaching almost 72% of missing data."Referral origin" was a variable classified as poor and fair in most of the years studied, obtaining lower incompleteness rates at the end of the historical series, where in 2020 it presented 7.15% of missing data.The variables "primary tumor laterality" and "examinations relevant to tumor therapy diagnosis and planning" presented an excellent score at the beginning of the study period, being classified as poor and even very poor in the following years.
The variable "previous diagnosis and treatment" and "screening date" presented an excellent classification in almost the entire period, changing the score to good in 2006 and 2007 for the first variable and in 2012 and 2013 for the second.The variable "date of start of treatment" was classified as excellent, except for 2009 to 2012 and 2018, where its score was good or fair.
The other variables in the bank presented excellent scores in all years studied, with emphasis on the variables "type of case", "date of first consultation", "primary tumor location", "detailed primary tumor location", "primary tumor histological type", "Brazilian National Registry of Health Establishments", "Hospital Unit Federative Unit" and "Hospital Unit municipality" which were 100% complete.Table 2 presents in detail and chronologically the incompleteness for clinical variables in the historical series studied.
Regarding the comparison of the scores of the HBCR epidemiological variables in the state of Espírito Santo, the Friedman test showed that there was no significant difference (p value = 0.324) in score classification; therefore, classification was similar between 2000 and 2020.
In Table 3, the Mann-Kendall test shows significant trends towards a decrease in incompleteness for the variables "family history of cancer", "alcoholism", "smoking", "source of referral", "TNM staging", "clinical tumor staging by group (TNM)" and "disease status at the end of first treatment in hospital".The variables "place of birth", "first care clinic", "clinic at the start of treatment", "origin", "primary tumor laterality" and "occupation" showed an increasing trend in the incompleteness rate.The variables that presented 100% completeness in all years studied were not included in the Mann-Kendall test and, therefore, do not appear in Table 3.
Figure 2 shows the graphs of historical series from 2000 to 2020 with the percentage of incompleteness of the variables that showed significant trends according to the Mann-Kendall test for the period studied.Time series with incomplete data are represented by solid lines, while dashed lines represent the temporal trend.

DISCUSSION
The results showed that, with regard to cases of malignant prostatic neoplasm in the state of Espírito Santo recovered and analyzed in HBCR, the majority of epidemiological variables were classified as having excellent and/or good completeness, highlighting the variables "sex", "age", "origin", "date of first consultation", "date of diagnosis", "previous diagnosis and treatment", "most important basis for tumor diagnosis", "primary tumor location", "detailed primary tumor location", "primary tumor histological type" and "first treatment received in hospital".However, other variables were classified in some years as fair and poor, such as "place of birth", "race/skin color", "education", "occupation", "marital status", "date of start of treatment", "examinations relevant to tumor therapy diagnosis and planning".Furthermore, there was weakness in the information on important clinical-epidemiological variables, with incompleteness above 50%, such as "TNM staging", "clinical tumor staging by group (TNM)", "family history", in addition to "alcoholism" and "smoking".Supporting our results, a study carried out with data from HBCR in the state of Mato Grosso showed that the variables "education", "TNM staging", "family history of cancer", "alcoholism" and "smoking" exhibited incompleteness above 50% (24) .
The variables "sex" and "age" presented completeness classified as excellent in the analyzed database, as found in other studies from HBCR in other Brazilian states (13,(15)(16)24) . It i believed that the low interpretative subjectivity required to record this information corroborates the reason for this good result.
The variable "place of birth" obtained 5.91% of incompleteness, leaving it with a good score, but it showed a tendency to increase in incompleteness in the period analyzed.In other studies conducted in HBCR in the state of Espírito Santo, this variable presented an average incompleteness of 10.33% (15) and 3.51% (16) .
"Race/skin color" is an important variable in the study of prostate cancer, as some ethnicities are risk factors for the development of this cancer, such as Africans and Asians, presenting higher incidence rates and shorter survival times for this neoplasm (5,25) .In other words, this variable merely transcends a biological distinction.In fact, it encompasses a complexity that represents a set of economic and cultural connotations, which denote inequalities in access to medical care, especially in the context of cancer diagnosis and treatment.Our findings support other research carried out in Brazil (15,24,(26)(27) .It is important to highlight that the lack of completeness in the collection of this variable, combined with possible erroneous records, makes it difficult to obtain a clear understanding of the real need for health promotion and disease prevention programs in vulnerable communities (27) .Additionally, the variable that considers race/ethnicity gains relevance by expanding debates to health inequities and individual, social and political-programmatic vulnerability (15,(28)(29) .
"Education" was classified as poor in more than 50% of the study period, with an average incompleteness of 31%, a result similar to that found in other studies (12,15,24,27,30) .The result found in HBCR of Hospital Santa Rita de Cássia, the only CACON in the state of Espírito Santo, presented 9.12% of missing data, which implies that the other HBCR in Espírito Santo have greater incompleteness for this variable (16) .This variable has a great impact on patients' prognosis, and its low completeness is of clinical and epidemiological relevance (15) .
The variable "occupation" presented, at the beginning of historical series, an excellent classification, however, from 2005 to 2020, there was an increase in the percentage of missing data, becoming a fair classification, with an average incompleteness of 14.57% in the period.In a study carried out in HBCR in 21 Brazilian states regarding the occupation variable, 46% of missing observations were identified (31) .Other studies find similar percentages (12,(15)(16)24,27) .
The variables "TNM staging", "clinical tumor staging by group (TNM)" and "pathological TNM staging" presented a poor or very poor completeness score in almost all years.These results corroborate other studies using data from HBCR across Brazil (12,(14)(15)(16)32) . On te other hand, a study using a database from a public hospital in São Paulo showed the variable "TNM staging" with excellent levels of completeness (27) .Staging variables are extremely important, as they provide information on the extent of the disease.This information helps in defining the therapeutic plan for people with cancer, which facilitates the standardization of procedures and the exchange of experiences between institutions that offer cancer treatment (11,15,24,33) .The variables "alcoholism" and "smoking" were classified as poor or very poor in almost the entire study period.This is a poor result, given the carcinogenic potential of alcohol and tobacco (34) .Furthermore, the variable "family history of cancer" was also classified as very poor in all years of the period, representing almost 80% of average incompleteness.This probably occurred because they are optional variables in the tumor form, and their completion varies substantially between hospital institutions.Such incompleteness is a worrying factor, as this variable is a risk factor for prostate cancer (2,(5)(6)(35)(36)(37) .

Study limitations
The present study has some limitations, such as the exclusive use of data obtained from all HBCR in a single Brazilian state.Consequently, caution must be taken when interpreting the findings in relation to their external validity and generalization to other Brazilian states and regions.Although HBCR provide valuable information about the quality of services offered, they do not comprehensively represent the underlying regional or national cancer epidemiology.

Contributions to nursing, health or public policy
To the best of our knowledge, this is the first study in a recent historical series that reports completeness of HBCR epidemiological variables on cases of malignant prostatic neoplasm across the Espírito Santo (ES) Oncological Care Network between 2000 and 2020, bringing valuable information for Epidemiological Surveillance and, specifically, for Cancer Surveillance in the Espírito Santo territory.It should be noted that, in 80% of countries, there is a growing trend in premature mortality from cancer, which is impacting the achievement of target 3.4 of the Sustainable Development Goals, which refers to the reduction of at least one third in premature mortality due to chronic non-communicable diseases by 2030 (38) .Thus, the importance of implementing, maintaining, updating and making available HBCR data is evident for a better understanding of cancer overview for its monitoring and control.

CONCLUSIONS
Summing up, we verified that, in fact, most of the revised HBCR epidemiological variables in the state of Espírito Santo, Brazil, were classified with excellent completeness, although important variables, such as "TNM staging" and "clinical tumor staging by group (TNM)", had high incompleteness rates for all years between 2000 and 2020.There is a pressing need for consistent and high-quality HBCR data to better monitoring of epidemiological variables in the tumor registry.HBCR contributions can greatly contribute to the structuring, formulation and planning of public policies aimed at improving early diagnosis, treatment and quality of life of the population.

Figure 1 -
Figure 1 -Historical series of the number of prostate cancer cases diagnosed from 2000 to 2020 registered in Hospital-Based Cancer Registries of the state of Espírito Santo (N=13,519)

Figure 2 -
Figure 2 -Trend of incompleteness of sociodemographic and clinical variables with a significant trend according to the Hospital-Based Cancer Registries Mann-Kendall test regarding prostate cancer cases in the Oncological Care Network of the state of Espírito Santo from 2000 to 2020 (N = 13,519)

Table 2 -
Percentage of incompleteness and classification of completeness of the Hospital-Based Cancer Registries clinical variables referring to prostate cancer cases in the Oncological Care Network of the state of Espírito Santo from 2000 to 2020 (N = 13,519)

Table 3 -
Analysis of the trend of incompleteness of the Hospital-Based Cancer Registries epidemiological and clinical variables regarding prostate cancer cases in the Oncological Care Network of the state of Espírito Santo from 2000 to 2020 (N = 13.519) *For significance, p value < 0.05.ofCompleteness of variables in Hospital-Based Cancer Registries for prostatic malignant neoplasmGrippa WR, Pessanha RM, Dell'Antonio LS, Dell'Antonio CSS, Salaroli LB, Lopes-Júnior LC.