The international dataset on the association between Langerhans Cell Histiocytosis and other malignancies

This article presents the international dataset of cases in which the association of Langerhans cell Histiocytosis (LCH) with other malignancies (AM) was documented occurring at any age before, concurrently or after LCH. These data are mostly derived from previously published manuscripts or from completed case report forms (CRFs) by Histiocyte Society (HS) members or colleagues. In particular, for each case of LCH-AM, the database reports all the available data about clinical and biologic characteristics of the two tumors, as well about treatment and status at follow-up. The AM were categorized as: i) leukemias [acute lymphoblastic or myeloid leukemia (ALL and AML, respectively), other leukemias] and myeloproliferative disorders; ii) lymphomas [Hodgkin lymphoma (HL) and non-Hodgkin lymphomas (NHL)] and iii) solid tumors. A total of 270 LCH-AM cases were documented, of which 116 (43%) occurred among children. After stratification by age at LCH diagnosis, using 18 years as cut-off between children and adults, we here provide details on the clinical characteristics in terms of LCH system involvement and affected organs, as well on the temporal relationship between the LCH and AM diagnoses, including details on the AM malignancy types. In 19 cases the LCH and the corresponding AM occurred in a different age group. The data set is available for future studies in view of new insights of the genetic or environmental determinants of LCH and/or of treatment related subsequent neoplasms.

A total of 270 LCH-AM cases were documented, of which 116 (43%) occurred among children. After stratification by age at LCH diagnosis, using 18 years as cut-off between children and adults, we here provide details on the clinical characteristics in terms of LCH system involvement and affected organs, as well on the temporal relationship between the LCH and AM diagnoses, including details on the AM malignancy types. In 19 cases the LCH and the corresponding AM occurred in a different age group. The data set is available for future studies in view of new insights of the genetic or environmental determinants of LCH and/or of treatment related subsequent neoplasms.  Table   Subject Health and medical sciences Specific subject area Hematology/Oncology -genetic predisposition and treatment related subsequent neoplasms Type of data Table  How the data were acquired The LCH-AM data set was initiated in 1991. Firstly, a request to members of the Histiocyte Society (HS) was made to report any patient of any age they were currently treating, or had previously treated with LCH and another malignancy which may have occurred before, concurrently or after the LCH diagnosis. Further updates were made by reviewing abstracts at HS or other international meetings. Secondly, the scientific literature was periodically searched through the PubMed data base. For each case of the reported association, an ad hoc case report form (CRF) (attached as supplementary data) was requested to be completed by HS members, otherwise data were directly abstracted from the published manuscripts by the personnel involved in this study (Figure 1 and 2) . Data collection ended on June 2015. Data format Raw Analyzed Description of data collection LCH-AM cases reported in this article are mostly derived from previously published manuscripts which are listed (Table 1) in the data sources location section of this document, or from CRFs by HS members or colleagues. Non eligible cases (i.e. histiocytic disorder other than LCH, AM being a benign tumor) were excluded. Cases were already pseudo anonymized or completely anonymized at the moment of their reporting in the literature and are thus identifiable only through their study identification code. Data source location All the information available and collected through CRFs was stored in an ad hoc Access database in a secure institutional server at the IRCCS Istituto Giannina Gaslini, in Genoa, Italy. List of the primary data sources is reported in Table 1

Value of the Data
• This dataset is the largest series to date of the association of 270 LCH cases with another malignancy and provides the relevant bibliography on published cases of LCH-AM. • For each case of the LCH-AM association, the dataset provides details on the LCH characterization (system and organ involvement), pathology reports of LCH and AM (where available), treatment exposures and clinical course both of LCH and AM. • This data can be used to generate hypotheses to investigate possible common pathways between the two malignancies which might then addressed by prospective data collection of clinical and biological data. • Any researcher interested in the patho-physiology of LCH may use these data to identify unusual associations even with other rare tumors which might be interesting to analyze in the light of new genetic findings.

Data Description
The raw data described in this article are available in Mendeley Data, DOI: https://data. mendeley.com/datasets/yvtfcr52x6/2 [1] . Details on 270 LCH cases with another associated malignancy (AM) are shown in "LCH_Malignancy_raw_data.xlsx". Detailed explanation of each column in the file, is reported in "LCH_Malignancy_Variables_list.pdf ". For each case, the reported data are on sex and ethnicity (classification according [2] ), age at LCH diagnosis, extension of the disease, system(s) involved, histopathology, surgery, treatment including details on site and dose of any radiotherapy, list of chemotherapy or immunotherapy drugs used. Further information refer to the AM and in details: age at diagnosis, type of malignancy, stage and site of the primary and if the malignancy occurred within the previous radiation port used to treat the LCH, if any; type of treatment(s) (surgery, chemotherapy and radiotherapy) with details for the site and dose of radiotherapy. Other information, if available, referred to the vital status at follow-up, cause of death for deceased subjects and the clinical status (active or remission) of both the LCH and the AM.
The study CRF is provided as a supplementary file. The data sources of the 270 cases are described in Fig. 1 and in Table 1 the primary data sources (all published manuscripts) are listed. Fig. 2 describes the periodic annual searches, from 2002 to 2015, through the PubMed data base. Table 2 reports details of the 270 cases in terms of system involvement (single system -SS, vs . multi system -MS), and affected organs. In the pediatric group (n = 116), the skeleton was the most frequently affected system (both in SS-and MS-LCH; n = 86; 74.1%), followed by skin (n = 51; 44%), lymph nodes (n = 26; 22.4%), liver and lungs (n = 21 each, 18.1%). Among adults (n = 154), the lungs (n = 65; 42.2%) were the most affected or gan followed by lymph nodes (n = 47, 30.5%) and skin (n = 35; 22.7%). Table 3 shows aggregate data with details of the observed leukemias and myeloproliferative disorders ( Table 3A ), lymphomas ( Table 3B ), solid tumors ( Table 3C ) by age at LCH diagnosis and timing of occurrence of the AM.

Experimental Design, Materials and Methods
This dataset was initiated in 1991 by R. Maarten Egeler that invited HS members to report any patient of any age they were currently treating, or had previously treated with LCH and another malignancy which may have occurred before, concurrently or after the LCH diagnosis. Further updates were made through periodic review of the scientific literature and of abstracts of the annual HS meetings or other international meetings attended by one of the authors. At the end of 2001, the responsibility of the project was transferred to R. Haupt and periodic searches through the PubMed data base were continued until 2015 [search terms: ("Histiocytosis, Langerhans-Cell" [Mesh] or "Langerhans cell histiocytosis" [tiab] or histiocytosis [tiab] or "eosinophilic granuloma"[tiab]) and ("neoplasms" [Majr] or leukemia [tiab] or lymphoma[tiab]) Filters: English]. The titles/abstracts of all studies identified by the search were screened by one reviewer (RH). The full texts of the potentially eligible studies were then obtained, and reviewers (RH/BC/FB) checked whether the articles fully complied with the inclusion criteria. For each reported case of LCH-AM occurrence a CRF was completed with all the data available in the manuscript; uncertain cases were identified by each reviewer and then discussed within the team. Since the same subject might have been reported in different case series, a further screening was made to identify duplicates after controlling for the pattern of association, the reporting institution and clinical details. Additionally, other articles, if any, included among references of selected manuscripts but not identified by the literature search, were screened by one reviewer (RH).
Patients were further stratified based on their age at LCH diagnosis with 18 years being the cut-off between children and adults. The associated malignancies were identified based on their histopathological report, as described in the manuscripts and/or in CRFs and original reports sent on voluntary basis, and further stratified in 4 categories: i) acute lymphoblastic leukemia; ii) acute myeloid leukemia, other leukemias and myeloproliferative disorders; iii) lymphomas (Hodgkin and non Hodgkin); iv) solid tumors.
Because of the only descriptive nature of this dataset, statistics were limited to calculating the interval between LCH and AM diagnosis and absolute frequencies and percentages by using Stata (StataCorp. Stata Statistical Software, Release 16.1 College Station, TX, Stata Corporation, 2019).

Ethics Statements
No formal approval by the Ethics Review Board was necessary at the moment of the project set-up (1991) and most of the dataset is based on literature review; for cases identified only through the study CRFs the authorization by the reporting physician to use the data was considered implicit with the voluntary transmission of data, we thus do not have any informed consent by the study subjects. According to guidelines by the General Data Protection Authority for secondary use of health data, it is possible to use data for research purposes " … if the researcher can show that pseudonymisation -or another aggregated data process -has been used so the researchers cannot identify the individual, and therefore cannot contact them to obtain consent". We believe that this statement applies to our dataset. The LCH-AM database is stored in a secure server at the Gaslini Institute, Genova, Italy.