Sarcoma Research with Cancer Registry Data: Data and Peculiarities of Germany in the Light of Other Countries

Introduction: Sarcomas are documented in population-based and in clinic-associated databases. This study evaluated the status quo regarding the potential and obstacles of cancer registry-based research on sarcomas exemplified by Germany in comparison to similar databases in the US and Europe. Completeness and quality of data are discussed based on statistical analyses of a pooled data set established for the German Cancer Congress 2020. Methods: We analyzed data derived from 16 German institutions (federal state cancer registries and some facility-based registries). Malignant sarcomas in adults diagnosed between 2000 and 2018 with information on histology were grouped according to the WHO classification of soft tissue and bone tumors. Descriptive analyses of the study population regarding the distribution of age, sex, histology, localization of primary tumors, and metastases were performed. Survival for the ten most frequent histological groups and UICC stages was evaluated according to Kaplan-Meier and Cox regression. Time interval between surgery and subsequent radiation was calculated. Results: The initial data set contained 35,091 sarcomas. After several steps of data cleaning, 28,311 patients with known sex and unambiguous assignment to a histological subgroup remained (13,682 women and 14,629 men). Between 40 and 54 years, women were more likely to develop sarcomas, whereas in the older age groups more men were affected. Gastrointestinal stromal tumors, fibroblastic, and myofibroblastic tumors, smooth muscle tumors (mostly non-uterine leiomyosarcomas), and adipocytic tumors represented 48% of all sarcomas. Preferential sites for fibrosarcomas were the limbs, the trunk, and the head and neck region. The liposarcoma occurred most frequently on the trunk and limbs. Distant primary metastases were mostly located in the lung (43%), followed by the liver (14%), and bones (13%). Vascular and smooth muscle tumors showed the worst survival prognosis (5-year survival: approx. 15%, median survival approx. 8–16 months), whereas in low stages, the probability of survival of many sarcoma patients was beyond 5 years. Adjuvant radiotherapy was applied within 90 days in 71% of patients (n = 2,534). Conclusion: Our results correspond to the data from the literature. However, a lack of data quality and completeness hampers further meaningful analyses, especially nonspecific or missing information about morphology and stage. Compared to some other countries, a comprehensive database is presently missing in Germany. However, currently, there are important efforts and legislative initiatives to create a comprehensive database on a national level within the near future.


Abstract
Introduction: Sarcomas are documented in populationbased and in clinic-associated databases. This study evaluated the status quo regarding the potential and obstacles of cancer registry-based research on sarcomas exemplified by Germany in comparison to similar databases in the US and Europe. Completeness and quality of data are discussed based on statistical analyses of a pooled data set established for the German Cancer Congress 2020. Methods: We analyzed data derived from 16 German institutions (federal state cancer registries and some facility-based registries). Malignant sarcomas in adults diagnosed between 2000 and 2018 with information on histology were grouped according to the WHO classification of soft tissue and bone tumors. Descriptive analyses of the study population regarding the distribution of age, sex, histology, localization of primary tumors, and metastases were performed. Survival for the ten most frequent histological groups and UICC stages was evaluated according to Kaplan-Meier and Cox regression.
Time interval between surgery and subsequent radiation was calculated. Results: The initial data set contained 35,091 sarcomas. After several steps of data cleaning, 28,311 patients with known sex and unambiguous assignment to a histological subgroup remained (13,682 women and 14,629 men). Between 40 and 54 years, women were more likely to develop sarcomas, whereas in the older age groups more men were affected. Gastrointestinal stromal tumors, fibroblastic, and myofibroblastic tumors, smooth muscle tumors (mostly nonuterine leiomyosarcomas), and adipocytic tumors represented 48% of all sarcomas. Preferential sites for fibrosarcomas were the limbs, the trunk, and the head and neck region. The liposarcoma occurred most frequently on the trunk and limbs. Distant primary metastases were mostly located in the lung (43%), followed by the liver (14%), and bones (13%). Vascular and smooth muscle tumors showed the worst survival prognosis (5-year survival: approx. 15%, median survival approx. 8-16 months), whereas in low stages, the probability of survival of many sarcoma patients was beyond 5 years. Adjuvant radiotherapy was applied within 90 days in 71% of patients (n = 2,534). Conclusion: Our results correspond to the data from the literature. However, a lack of data karger@karger.com www.karger.com/ort Introduction Classification and Coding of Sarcomas in Cancer Registration Sarcomas are a very heterogeneous group of cancers with more than 50 different histological subtypes [1]. Cancer registries classify tumor diseases according to the ICD-10-GM, which is organ-specific. Sarcomas are coded with ICD-10-GM C40-C41 (malignant neoplasm of bone and articular cartilage), C44 (dermatofibrosarcoma protuberans and others), C46 (Kaposi's sarcoma), C47 (malignant neoplasm of peripheral nerves and autonomic nervous system), C48 (malignant neoplasm of retroperitoneum and peritoneum), and C49 (malignant neoplasm of other connective tissue and other soft tissue). However, as some of the ICD-10 coding is not exclusively reserved for sarcomas, the above mentioned ICD-10 codes do not cover all sarcomas by far. For example, leiomyosarcomas of the uterus are coded as C55, sarcomas of the lung as C34. Gastrointestinal stromal tumors (GIST) might be classified C49.4 (soft tissue of the abdomen) or C16.9/17.9 (malignant neoplasm, unclassified stomach/small bowel). GIST therefore might be hidden in a cancer incidence pattern not addressing sarcomas [2]. Furthermore, histological classification usually follows the recommendations of the ICD-O, missing out some of the entities coded currently in the WHO [1]. Moreover, tumor documentation is hampered since assessment of the dignity of single entities changed over time due to new correlations between pathology and clinical course. This results in code changes in the classification system. For example, the super-rare entity of epithelioid hemangioendothelioma was thought to be a rather low aggressive tumor and classified accordingly 9133/1. Nowadays, with more clinical data available, it is named "malignant epithelioid hemangioendothelioma" and classified 9133/3. Beyond changes in the classification system over time other parameters are also handled differently or are not reported in a standardized way. Other clinical parameters, for example, which are crucial for tumor staging, such as mitotic rate for GIST, are often missing or are recorded in unstandardized free-text in German clinical cancer registries. According to the German national law of cancer registration (Cancer Screening and Registry Act; Krebsfrüherkennungs-undregistergesetz, KFRG) [3], not all types of sarcoma treated in accordingly specialized oncological centers (sarcoma centers) need to be reported to cancer registries. This includes precancerous bone and soft tissue sarcomas (e.g., ICD-10-GM D48.0 or D48.1) and cutaneous sarcomas (ICD-10-GM C44). For the purpose of getting accreditation as a sarcoma center, precancerous lesions do not count. Therefore, the German Cancer Society provides a list of those morphologies according to the WHO, relevant to count for getting the certificate of a sarcoma center [4]. Thus, some diseases are recorded for certification purposes in facility-based registries but not in clinical cancer registries.

Situation in Germany
In 2009, the Federal Cancer Registry Data Act (BKRG) entered into force and the German Centre for Cancer Registry Data (ZfKD) was established at the Robert Koch Institute. This legal framework enabled all regional cancer registries to transfer pseudonymized epidemiological data to the ZfKD on an annual basis. This newly established nationwide data set provided the basis for a very first estimation of the sarcoma incidence in Germany [5]. The 2013 Cancer Screening and Registry Act (KFRG) [3] constituted a further milestone toward the development of cancer registration in Germany. This law stipulates that each federal state is to establish a system of clinical cancer registration for quality assurance purposes, which record detailed information on diagnosis, treatment, and course of the disease. Nevertheless, the German Center for Cancer Registry Data is not yet able to provide a fused data set containing detailed clinical information, e.g., on cancer therapy.
For rare diseases such as sarcomas, merging of information from different sources of data enables statistical analyses on the basis of well-powered data sets. One national approach has been initiated by the Association of German Tumor Centers (Arbeitsgemeinschaft Deutscher Tumorzentren, ADT). Every 2 years, German cancer registries and some facility-based registries are invited to provide data on certain tumor entities. Statistical analyses are carried out by entity-specific evaluation teams regarding the oncological care of patients. Results are presented in the context of the German Cancer Congress in Berlin in a National Quality Conference [6].
The aim of this paper was an evaluation of the status quo regarding the potential and obstacles of cancer registry-based research on sarcomas exemplified by Germany in comparison to similar databases in the US and Europe. Completeness and quality of the data are discussed using statistical analyses of the pooled data set established created for the German Cancer Congress 2020.

Materials and Methods
Our investigation is based on the data set of the 8th National Quality Conference, which took place on the occasion of the German Cancer Congress in 2020. The hospital-based cancer registries located at tumor centers, which are members of the ADT, and all statewide population-based clinical cancer registries were invited to deliver data on diagnosis and therapy of sarcomas to the confidentiality office at the ADT. There were no requirements for completeness or data quality criteria but checks for plausibility, and duplicates (e.g., the same patient is registered at different cancer registries) were removed. For this study, we used data from 16 institutions (listed in the acknowledgment) that followed the call for data.
Our analyses focused on patients with sarcoma diagnosis between 2000 and 2018. Tumors with information on histology were grouped according to the classification of soft tissue and bone tumors published by the WHO in 2020 [1], but only malignant tumors were counted (ICD-O behavior code =/3, exception: GIST with/1 or/3) [7]. Haematopoetic neoplasms of bone were excluded. "Sarcomas not otherwise specified (NOS), ICD-O-3: 8800/3" are reported separately. Further classification of sarcomas was performed with respect to the guidelines for specialized sarcoma centers established by the German Cancer Society [4].
Primarily, exemplary descriptive analyses are presented. The patient population as well as the exclusion of patients for individual analyses is shown. Age by sex and the distribution of histological groups are presented. Heat maps were chosen to illustrate the frequencies of localization versus histological groups.
Pathological and clinical UICC stages refer to the information recorded by the cancer registries or were calculated from the clinical and pathological TNM (edition TNM7 and TNM8). A common UICC stage was calculated, which corresponds to the pathological stage, if available.
The presence of primary distant metastases is determined based on reported localizations of primary distant metastasis or based on M of the TNM (if no primary distant metastasis localization is known for the tumor). Patients with multiple distant metastases were counted several times accordingly. Some data sets with unstandardized entries were excluded. Radiotherapies applied within 90 days after (the first primary) surgery were considered adjuvant.
Survival curves according to Kaplan-Meier were compared to a model-based exemplary Cox regression [8,9], which was formed with stepwise backward selection. The final model contained effects of UICC stage, age, and interaction between stage and age and was stratified for histological groups. The follow-up date of the cancer registries was defined on December 31, 2016. Cancer registries that provided follow-up data for a short period of time were excluded. The analysis was performed for the 10 most frequent histological groups and for stages 1 to 4 as the analysis bears the risk of inaccuracy for small numbers of patients in further subgroups. All statistical analyses were performed with SAS 9.4. Table 1 describes the individual steps of data cleaning and the resulting reduction of the original data set. The data set of the 8th National Quality Conference contained 35,091 sarcomas. After selecting the malignant sarcomas, 33,787 tumors remained. Further exclusion criteria were unknown sex, missing date of diagnosis or date of birth, unreported ICD-10, age under 18 and duplicates (reports from two or more registries for one case), as well as data of one registry that did not participate in this analyzes. There were remaining 28,711 patients, with this data set being reduced depending on the research question to those sarcomas that contained the information relevant for the evaluation.

Description of the Included Patients
For survival time analysis, the number of patients had to be reduced additionally because the UICC stage could only be calculated for 37% of the patients (n = 5,345 of 14,351 sarcomas). Therefore, a reduced data set of 4,533 sarcomas was used for survival analysis (Table 1). Figure 1 illustrates the age distribution of patients by sex (14,629 men, 13,682 women, median age 66 years). At a younger age (between 40 and 54 years), women were more likely to develop sarcomas, regarding persons over 55 years, more men than women were affected.

Distribution of Histology
GIST were by far the most common entity, followed by fibroblastic and myofibroblastic tumors, smooth muscle tumors (mostly non-uterine leiomyosarcomas), and adipocytic tumors (Table 2), together representing 48% of all malignant sarcomas.

Localization and Histology
As expected, there were preferential sites for some sarcomas. Fibrosarcomas occurred most frequently on the limbs, the trunk, and in the head and neck region. The liposarcoma was most commonly located on the trunk and limbs. The phylloid tumor was very rare and located exclusively on the trunk. On the thorax, sarcomas were reported less frequently (Fig. 2). GIST was excluded from the heat map because 97% were located on the abdomen.

Localization of Primary Metastases
There were a total of 3,306 (multifocal) metastases, of which 781 metastases were only known from M (of the TNM). If several localizations were reported for a tumor,  it was counted twice. 79.4% of all metastases were at the 10 most frequent localizations (2,625 out of 3,306; Table 1). The most frequent distant metastases were located in the lung (43%), followed by the liver (14%), and bones (13%) (Fig. 3).

Survival in Function of Age, UICC Stage, and Histology (Cox Regression and Kaplan-Meier)
In low stages, the probability of survival of many sarcoma patients was beyond 5 years. GIST patients with high stages at initial diagnosis still had quite good survival rates, followed by patients with fibroblastic/ myofibroblastic tumors. Vascular and smooth muscle tumors showed the worst survival prognosis (5-year survival approx. 15%, median survival approx. 8-16 months). The Cox regression smoothed out some of the variation so that the survival curves of the histological groups were closer together (Fig. 4).

Discussion
The analysis presented here gives a first impression of the extent to which the information available in the clinical cancer registries reflects the reality of diagnosis and treatment of sarcomas in Germany. The results shown largely correspond to the data from the literature [5,[10][11][12][13][14]. The distribution of age, UICC stages, localization, metastasis sites, and survival times confirm the representativeness and plausibility of the registry data [15,16]. However, the present evaluation also demonstrates how many cases had to be excluded because data were incompletely reported. This distribution of histologies, for example, is in line with expectations, but there remains a large group of tumors, which are not specified or are unclassifiable. Unspecific coding was widespread in the early 2000s and is still common in cancer registration. In many cases, correct stage grouping was not possible and treatments could not be assigned to the corresponding stages. In general, therapies were not reported in such detail that many meaningful analyses could have been done. However, a first evaluation on the proper use of adjuvant radiotherapy showed that treatment starts between 40 and 90 days after surgery with a median of 56 days postoperatively. This is in line with recommendations by guidelines [17,18]. Documentation of very early radiation (after surgery) could mean the use of intraoperative radiotherapy or brachytherapy, specific irradiation applications that are not reported in such detail. A late start of adjuvant irradiation therapy could be due to complications after surgery, but again, available data did not allow to elucidate this question. However, misreporting on radiotherapy of recurrent disease cannot be ruled out. In this context, it is important to work on better documentation in the near future.

Limitations and Strengths
The routine evaluation of sarcoma data with the methods commonly used in cancer registries is hampered due to the great heterogeneity with regard to localization, morphology, molecular genetics, prognosis, etc. Reasonably combined groups depending on the research question are necessary.
For the 8th National Quality Conference 2020, the exported data sets were, for the first time, selected by histology (ICD-O-3) instead of a selection by localization (ICD-10). This increased effort in data extraction resulted in fewer registries participating than for other entities presented at the National Quality Conference [19].
The recording in cancer registries in Germany is not consistent. There are differences with regard to the captured time period and variables recorded. In part, the database is incomplete (especially in case of therapies). This diversity in the data set also reflects the different types of registries whose data were merged. Some of these are not only population-based regional or statewide cancer registries but also institution-based data collections from university hospitals or oncological centers were included.
The strengths of the present evaluation include the size of the data set, which in principle also allows subgroup analyses, and the long observation periods. Both the central data processing at the ADT and the close methodological coordination among the evaluation teams of different entities increase the standardization and quality of the analyses.

International Situation
On the international level, various national databases as well as international study groups and networks exist (Table 3). When evaluating the contribution of Surveillance, Epidemiology, and End Results (SEER) database to the international sarcoma literature, Lyu et al. [20] found more than 300 publications [21]. SEER is not specifically focused on sarcomas, and consistency tests have shown that reliability of histopathological diagnoses have to be challenged when re-evaluated. Thus, SEER is in strict contrast to the French Conticabase (see below) with reference pathology being the entry criterion for submitting case data.
The American College of Surgeons and the American Cancer Society jointly sponsor the National Cancer Database (NCDB), which started in 1989. It is a clinical oncology database sourced from hospital registry data that are collected in more than 1,500 Commission on Cancer-accredited facilities, and today NCDB contains more than 34 million records. NCDB data are used to analyze and track patients with all malignant neoplastic diseases, and the data represent more than 70% of newly diagnosed cancer cases nationwide [22]. NCDB often is helpful in overcoming the rarity of certain families of cancerous diseases and subgroups. However, it is necessary and possible to limit the retrieval to certain time spans [23].
The national sarcoma base of France has developed over several steps. It started with a pathology-reviewed collection of specimens by the French Sarcoma Group (FSG) at Bordeaux. Through funding by the National Cancer Institute (NCA) and the EU-funded network of excellence, CONTICANET developed into CONTICABASE/CON-TICA-GIST for the specific sarcoma subgroups. Today, the Sarcoma Clinical and Biological database (Sarcoma BCB) comprises six different clinical and detailed pathological BCB applications with data of more than 115,000 patients with sarcomas. One of the most important applications is the network of 26 clinical sarcoma centers throughout France (NetSarc) which collects clinical data on management and mixed-tumor board decisions of soft tissue, visceral, and bone sarcomas [24]. By this way, it could be confirmed that patients treated in a sarcoma center have better outcome than those treated outside those centers [25]. Denmark also started to collect data as a national, population-based database (Danish Sarcoma Database) established in 2009 [26]. Historically, all Scandinavian countries already collaborated since 1979 in the Scandinavian Sarcoma Group (SSG) [27]. This group set up a referral system for sarcoma patients with an excellent documentation of patients' treatment and their clinical course. The SSG conducted more than 25 studies since the beginning deriving many of the questions to be studied from analyzing their own database. One of these studies (SSGXVIII) led to the registration of 3 years of adjuvant imatinib in GIST as therapeutic standard [28].
The Netherlands Cancer Registry database not only provides free access via its website to the most important cancer statistics, e.g., incidence and survival, but also allows to answer clinical research questions. One of the publications concerns the patterns of perioperative treatment and survival of localized, resected, intermediate, or high-grade soft tissue sarcomas in 4,957 patients. The study reported an overall survival rate in this subgroup of 59% and allows to analyze whether postoperative adjuvant radiation therapy had been administered as recommended by guidelines [29].
More recently, large databases have been established addressing specific questions in sarcoma clinical treatment and research. Sarcoma of the retroperitoneal space (RPS) contributes only to around 15% of all sarcomas and therefore data about the treatment were based on small case series for a long time. In 2013, the Transatlantic RPS Working Group (TARPSWG) constituted as a multi-institutional international collaboration of specialized sarcoma centers from Europe and North America. The collaboration quickly generated a pooled data set of 1,007 primary RPS patients with a minimum of 5 years of follow-up but treated within the past 10 years to eliminate historical cases. Subsequently, standards of treatment, morbidity data, and figures on tumor recurrence were published [30][31][32]. The existence of different classification systems (by location, by histology, from different scientific or political bodies) should be discussed as a potential problem for comparison and international standards.

Outlook
The evidence-based guideline for adult soft tissue sarcomas was published first on September 28, 2021 [18]. The quality indicators contained in the guideline   are predominantly calculable with data from clinical cancer registries. For this purpose, there is a close cooperation between the German Cancer Society and the cancer registries in various working groups of the German Guideline Program in Oncology (GGPO) [33]. The updated version of the German standardized "Basic Oncological Data Set" [34] allows improved documentation in greater detail. Important innovations are the option of documenting genetic markers, as well as the therapy recommendation of the interdisciplinary tumor board, including deviation from suggested therapies at the request of the patient. Many catalogs of characteristic features have been expanded. Especially in the chapter radiotherapy: coding system for the target area, type of application, and applied type of radiation, also the separate documentation of a boost is now possible. The new variables to report a systemic therapy facilitate the differentiated reproduction of the therapy strategies.
The national merging of data from clinical cancer registries according to the amendment to the Federal Cancer Registry Data Act [35] will create a more comprehensive database. With the beginning of 2023, the data set transmitted annually by the cancer registries to the Centre for Cancer registry Data (ZfKD) will be supplemented with data on treatment and the clinical course. In a second step, a platform will be set up to allow access to additional and more up-to-date data in the registries as well as enable linkage of cancer registry data with other data (e.g., from scientific studies or health insurance data) [36].
The strengthening of the cooperation of all institutions involved in the diagnosis and treatment of sarcoma patients in Germany with the consolidation of their respective data collections, as well as the creation of the above mentioned comprehensive database on a national level, are to be considered fundamentally important in order to carry out a solid sarcoma data analysis and thus ultimately to achieve an improvement in treatment outcomes.