Histopathology imagery dataset of Ph-negative myeloproliferative neoplasm

Tumorous cancer has been a widely known and well-studied medical phenomenon; however, rare diseases like Myeloproliferative Neoplasm (MPN) have received less attention, leading to delayed diagnosis. Despite the availability of advanced technology in diagnostic tools that can boost the procedure, the morphological assessment of bone marrow trephine (BMT) images remains critical to confirm and differentiate MPN subtypes. This paper reports a histopathological imagery dataset that was created to focus on the most common MPN from the Philadelphia Chromosome (Ph)-negative type, namely Essential Thrombocythemia (ET), Polycythemia Vera (PV), and Primary Myelofibrosis (MF). The dataset consisted of 300 BMT images that can be used to enable computer vision applications, such as image segmentation, disease classification, and object recognition, in assisting the classification of the MPN disease. Ethical approval was obtained from the Ministry of Health, Malaysia and the bone marrow trephine images were captured using a digital microscope from the Olympus model (BX41 Dual head microscope) with x10, x20, and x40 lens types. The development of comprehensive tools deployed from this dataset can assist medical practitioners in diagnosing diseases, thus overcoming the current challenges.


a b s t r a c t
Tumorous cancer has been a widely known and well-studied medical phenomenon; however, rare diseases like Myeloproliferative Neoplasm (MPN) have received less attention, leading to delayed diagnosis. Despite the availability of advanced technology in diagnostic tools that can boost the procedure, the morphological assessment of bone marrow trephine (BMT) images remains critical to confirm and differentiate MPN subtypes. This paper reports a histopathological imagery dataset that was created to focus on the most common MPN from the Philadelphia Chromosome (Ph)-negative type, namely Essential Thrombocythemia (ET), Polycythemia Vera (PV), and Primary Myelofibrosis (MF). The dataset consisted of 300 BMT images that can be used to enable computer vision applications, such as image segmentation, disease classification, and object recognition, in assisting the classification of the MPN disease. Ethical approval was obtained from the Ministry of Health, Malaysia and the bone marrow trephine images were captured using a digital microscope from the Olympus model (BX41 Dual head microscope) with x10, x20, and x40 lens types. The development of comprehensive tools deployed from this dataset can assist medical practitioners in diagnosing diseases, thus overcoming the current challenges. ©

Value of the Data
• Bone marrow morphology is of fundamental importance to distinguish different Myeloproliferative Neoplasm (MPN) subtypes [1][2][3] . This MPN dataset is able to assist medical practitioners, particularly in familiarising them with the morphology features with good qualitative image analysis. • With proper augmentation methods, the MPN dataset can be expanded and used to train machine learning models for the classification of Philadelphia Chromosome (Ph)-Negative MPN. • The MPN dataset can be utilised as a test dataset or validation in the machine learning classification model. • Additionally, the MPN dataset can be used by machine learning researchers or data scientists to develop solutions using artificial intelligence techniques for addressing problems such as human error, high dependency on human expertise, interobserver variability, and delayed diagnosis [2][3][4] . • The generated dataset can be used to facilitate various computer vision tasks such as image segmentation, disease classification, and object recognition.

Data Description
This imagery dataset consisted of 300 labelled histopathological images from three common types of Ph-Negative MPN, namely Essential Thrombocythemia (ET), Polycythemia Vera (PV), and

Table 1
Relative frequency of features in bone marrow for PV, ET, and MF [5] .

Data acquisition
The sample of bone marrow biopsy of patients with a confirmed diagnosis of MPN was prepared by the pathologist at Hospital Serdang, Malaysia. A total of 100 bone marrow trephine (BMT) images for each MPN type were captured from raw sample slides using a digital microscope from the Olympus model (BX41 Dual head microscope) as shown in Fig. 3 . Different types of lens were used (x10, x20, and x40) to obtain different levels of magnifying details of the BMT images.
The number of slides provided by the pathologist and the total number of images captured for each class are listed in Table 2 .

Data cleaning and data checking
All data collected was labelled accordingly, followed by a parallel discussion with the pathologist to verify the originality preservation of the bone marrow images and to confirm the region of interest in the captured images. Images that fulfilled the criteria were kept while those without the artefact and region of interest were removed from the dataset.
Bone marrow biopsy image gathered from patients diagnosed with three types of Ph-negative MPN was established as a basic requirement for the inclusion criteria; whereas, the exclusion criteria were applied for any evolved MPN. Additionally, the data collected was not attributed to time, period, or any geographic location.

Limitations
This imagery dataset has a limited sample size due to the rare nature of the MPN disease and the availability of a single source of data collection. Besides, the bone marrow morphology may not be presented for all relative frequencies of features in the dataset.

Ethics statement
This data collection did not involve any clinical experiment or direct involvement of patients. No informed consent was obtained specifically for the bone marrow biopsy procedure. Patients were verbally informed about the collection of samples during their clinic visits. The verbal consent was documented by the doctor in charge in the patients' notes before the procedure.
Ethical approval for data collection was obtained from the Ministry of Health, Malaysia with reference number NMRR-18-4023-42507 (IIR).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.