Erythrocyte (red blood cell) dataset in thalassemia case

Red blood cell (RBC) dataset was obtained from four thalassemia peripheral blood smears and a healthy peripheral blood smear. The dataset contains 7108 images of individual red blood cells for nine cell types. The first process is image acquisition, which is the process of retrieving microscopic image data from peripheral blood smears through a Olympus CX21 microscope using an Optilab advance plus camera. Laboratory assistants helped obtain ideal erythrocyte images. We provide peripheral blood smear from four thalassemia patients in the ThalassemiaPBS dataset. After image acquisition, the image is resized from 4100 × 3075 pixels to 800 × 600 pixels to reduce the computing load in the next stage. We extracted the green color component (green channel) of the RGB image and used it in the next process. We chose the green channel because it is not affected by variations in color and brightness. Furthermore, the segmentation stage is carried out to obtain an object in the form of a single red blood cell. After that, the object can be classified according to the type of red blood cell. This dataset can become an opportunity for international researchers to develop the classification method for red blood cells.


a b s t r a c t
Red blood cell (RBC) dataset was obtained from four thalassemia peripheral blood smears and a healthy peripheral blood smear. The dataset contains 7108 images of individual red blood cells for nine cell types. The first process is image acquisition, which is the process of retrieving microscopic image data from peripheral blood smears through a Olympus CX21 microscope using an Optilab advance plus camera. Laboratory assistants helped obtain ideal erythrocyte images. We provide peripheral blood smear from four thalassemia patients in the ThalassemiaPBS dataset. After image acquisition, the image is resized from 4100 × 3075 pixels to 800 × 600 pixels to reduce the computing load in the next stage. We extracted the green color component (green channel) of the RGB image and used it in the next process. We chose the green channel because it is not affected by variations in color and brightness. Furthermore, the segmentation stage is carried out to obtain an object in the form of a single red blood cell. After that, the object can be classified according to the type of red blood cell. This dataset can become an opportunity for international researchers to develop the classification method for red blood cells. The thalassemia peripheral blood smear images (ThalassemiaPBS dataset) were taken by Optilab advance plus camera and taken from Olympus CX21 microscope. We took the peripheral blood smear images with 10 0 0x total magnification of the oil immersion objective lens (100 ×) combined with a 10 × eyepiece. The original image resolution is 4100 × 3075 pixels (RGB images). The image is resized to 800 × 600 pixels to reduce the computing load in the next stage. We extracted the green color component (green channel) of the RGB image because it is not affected by variations in color and brightness. Furthermore, the segmentation stage is carried out to obtain an object in the form of a single red blood cell.

Value of the Data
• The ThalassemiaPBS dataset is a peripheral blood smear images collected from thalassemia patients. At present, it is infrequent for a public dataset to relate to microscopic images of thalassemia's peripheral blood smear. Therefore, this dataset will be a source of data for computer application researchers related to thalassemia. • The RBCdataset is a single erythrocyte dataset collected from thalassemia's peripheral blood smear. This dataset will be a source of data for computer application researchers related to RBC classification. • Researchers interested in solving red blood cell segmentation cases, especially in thalassemia cases, can use the ThalassemiaPBS dataset. Researchers interested in solving red blood cell classification cases, especially in thalassemia cases, can use this dataset. • Presented data can be used to develop an RBC classification system or as additional data in the system development process. • Researchers can further analyze this data to obtain the most representative features for each cell type. • This data can become an opportunity for international researchers to develop a work support system for the pathologist.

Experimental Design, Materials and Methods
The red blood cell images derived from 4 thalassemia patients peripheral blood smear and a healthy peripheral blood smear. The procedure of peripheral blood smear preparation was carried out according to the guidelines in [2] using wedge technique: 1. A blood drop (approximately 2 to 3 mm in diameter) of EDTA anticoagulated blood is placed at one end of the slide. The pusher slide, held securely in the dominant hand at an angle of about 30 to 45 ° ( Fig. 2 , A), is drawn back into the drop of blood, and the blood is allowed to spread over the entire width of the slide ( Fig. 2 , B). It is then quickly and smoothly pushed forward onto the end of the slide to create a wedge film ( Fig. 2 , C). 2. After the film preparation method, before staining, all blood films should be dried as quickly as possible to avoid drying artifacts. 3. The slide is placed on the shelf, the film side facing up. Pure Wright stain or Wright-Giemsa stain (Romanowsky stain) is used. Wright stains can be filtered before use or poured directly from the bottle through the filter onto a slide. It is essential to flood the slide completely. The stain must remain on the slide for at least 1 -3 min for the cells to adhere to the glass. Then approximately the same amount of buffer is added to the slide. Surface tension allows the very little buffer to flow. The mixture was allowed to remain on the slide for 3 min. 4. When staining is complete, the slide is rinsed with a steady but gentle stream of neutral pH water, the back of the slide is cleaned to remove stain residue, and the slide is air-dried in a vertical position.
Furthermore, digital image retrieval using a microscope and additional camera. As shown in Fig. 3 , the following process is carried out according to the stages in the study of Tyas et al., [1] . In the preprocessing step, the image is resized from 4100 × 3075 pixels to 800 × 600 pixels to reduce the computing load in the next stage. Then, we used the green channel of the image and used it for the following process. The method used in preprocessing and segmentation stages is shown in Fig. 4 . The segmentation stage is carried out to obtain red blood cell candidates. Median filtering, canny edge detection, dilation, and hole filling were used. WDT operation is used to separate the overlapping erythrocytes. Then erosion is applied, followed by removing  small objects with an area below 500 pixels. We chose 500 pixels because the minimum value of the area feature obtained in the dataset was 526 pixels, so the closest value was determined, 500 pixels. The cells at the edge of the image are deleted because they have an incomplete cell shape.
Next, the detection of single erythrocytes, overlapping erythrocytes, and white blood cells (WBC) was done. The detection process is carried out on all cells in the visual field image based on thresholding to detect the objects. We used area, color intensity, and eccentricity parameters in the thresholding process. The sample of image results for every method used in preprocessing and segmentation stage is shown in Fig. 5 . Finally, the clinical pathologist carried out the sorting and grouping process to determine the cells used as a dataset. A dataset of red blood cells with nine cell types was obtained from this stage. This process follows the nomenclature from ICSH [3] .

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships which have or could be perceived to have influenced the work reported in this article.