A dataset of microscopic peripheral blood cell images for development of automatic recognition systems.

This article makes available a dataset that was used for the development of an automatic recognition system of peripheral blood cell images using convolutional neural networks [1]. The dataset contains a total of 17,092 images of individual normal cells, which were acquired using the analyzer CellaVision DM96 in the Core Laboratory at the Hospital Clinic of Barcelona. The dataset is organized in the following eight groups: neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes (promyelocytes, myelocytes, and metamyelocytes), erythroblasts and platelets or thrombocytes. The size of the images is 360 × 363 pixels, in format jpg, and they were annotated by expert clinical pathologists. The images were captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection. This high-quality labelled dataset may be used to train and test machine learning and deep learning models to recognize different types of normal peripheral blood cells. To our knowledge, this is the first publicly available set with large numbers of normal peripheral blood cells, so that it is expected to be a canonical dataset for model benchmarking.


a b s t r a c t
This article makes available a dataset that was used for the development of an automatic recognition system of peripheral blood cell images using convolutional neural networks [1]. The dataset contains a total of 17,092 images of individual normal cells, which were acquired using the analyzer CellaVision DM96 in the Core Laboratory at the Hospital Clinic of Barcelona. The dataset is organized in the following eight groups: neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes (promyelocytes, myelocytes, and metamyelocytes), erythroblasts and platelets or thrombocytes. The size of the images is 360 × 363 pixels, in format jpg, and they were annotated by expert clinical pathologists. The images were captured from individuals without infection, hematologic or oncologic disease and free of any pharmacologic treatment at the moment of blood collection. This high-quality labelled dataset may be used to train and test machine learning and deep learning models to recognize different types of normal peripheral blood cells.
To our knowledge, this is the first publicly available set with large numbers of normal peripheral blood cells, so that it is expected to be a canonical dataset for model benchmarking.
© 2020 The Author(s

Value of the data
• This dataset is useful in the area of microscopic image-based hematological diagnosis since the images have high-quality standards, have been annotated by expert clinical pathologists and cover a wide spectrum of normal peripheral blood cell types. • The dataset can be useful to perform training and testing of machine and deep learning models for automatic classification of peripheral blood cells. • This dataset can be used as a public canonical image set for model benchmarking and comparisons. • This dataset might be used as a model weight initializer. This means to use the available images to pre-train learning models, which can be further trained to classify other types of abnormal cells.

Data
The normal peripheral blood dataset contains a total of 17,092 images of individual cells, which were acquired using the analyser CellaVision DM96. All images were obtained in the color space RGB. The format and size of the images is jpg and 360 × 363 pixels, respectively, and were labelled by clinical pathologists at the Hospital Clinic.
The dataset is organized in eight groups of different types of blood cells as indicated in Table 1 .
Although the group of immature granulocytes includes myelocytes, metamyelocytes and promyelocytes, we have kept all in a single group for two main reasons: (1) the individual identification of specific subgroups does not have special interest for diagnosis; and (2) morphological differences among these groups are subjective even for the clinical pathologist. Fig. 1 shows examples of the ten types of normal peripheral blood leukocytes that conform the dataset.

Experimental design, materials, and methods
The images were obtained during the period 2015-2019 from blood smears collected from patients without infections, hematologic or oncologic diseases and free of any pharmacologic treatment at the moment of their blood extraction. The procedure followed the daily work flow standardized in the Core Laboratory at the Hospital Clinic of Barcelona, which is illustrated in Fig. 2 .
The work flow starts in the Autoanalyzer Advia 2120 instrument where blood samples are processed to obtain a general cell count. In a second step, the blood smears were automatically stained using May Grünwald-Giemsa [2] in the autostainer Sysmex SP10 0 0i. This automated process ensures equal and stable staining regardless of the specific user. The laboratory has a standardized quality control system to supervise the procedure.
Then the resulting stained smear goes through the CellaVision DM96 where the automatic image acquisition was performed. As a result, images of individual normal blood cells, with jpg format and size 360 × 363 pixels, were obtained. Each cell image was annotated by the clinical pathologist and saved with a random identification number to remove any link and traceability to the patient data, resulting in an anonymized dataset. No filter and further pre-processing were performed to the images.
The above acquisition procedure has been extensively used by our research group in several developments related to cell image segmentation and classification of peripheral blood cells [3][4][5][6][7] . The dataset presented in this article has been used in our more recent work to develop a convolutional neural network model for the automatic classification of eight types of normal peripheral blood cells [1] .

Disclaimer
This dataset is intended to be used for research and educational purposes only.

Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Supplementary materials
Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.dib.2020.105474 .