An annotated high-content fluorescence microscopy dataset with Hoechst 33342-stained nuclei and manually labelled outlines

Automated detection of cell nuclei in fluorescence microscopy images is a key task in bioimage analysis. It is essential for most types of microscopy-based high-throughput drug and genomic screening and is often required in smaller scale experiments as well. To develop and evaluate algorithms and neural networks that perform instance or semantic segmentation for detecting nuclei, high quality annotated data is essential. Here we present a benchmarking dataset of fluorescence microscopy images with Hoechst 33342-stained nuclei together with annotations of nuclei, nuclear fragments and micronuclei. Images were randomly selected from an RNA interference screen with a modified U2OS osteosarcoma cell line, acquired on a Thermo Fischer CX7 high-content imaging system at 20x magnification. Labelling was performed by a single annotator and reviewed by a biomedical expert. The dataset, called Aitslab-bioimaging1, contains 50 images showing over 2000 labelled nuclear objects in total, which is sufficiently large to train well-performing neural networks for instance or semantic segmentation. The dataset is split into training, development and test set for user convenience.


a b s t r a c t
Automated detection of cell nuclei in fluorescence microscopy images is a key task in bioimage analysis. It is essential for most types of microscopy-based high-throughput drug and genomic screening and is often required in smaller scale experiments as well. To develop and evaluate algorithms and neural networks that perform instance or semantic segmentation for detecting nuclei, high quality annotated data is essential. Here we present a benchmarking dataset of fluorescence microscopy images with Hoechst 33342-stained nuclei together with annotations of nuclei, nuclear fragments and micronuclei. Images were randomly selected from an RNA interference screen with a modified U2OS osteosarcoma cell line, acquired on a Thermo Fischer CX7 high-content imaging system at 20x magnification. Labelling was performed by a single annotator and reviewed by a biomedical expert. The dataset, called Aitslab-bioimaging1, contains 50 images showing over 20 0 0 labelled nuclear objects in total, which is sufficiently large to train well-performing neural networks for instance or semantic segmentation. The dataset is split into training, development and test set for user convenience.

Value of the Data
• The dataset is of use to bioimage analysts as well as academic and industry research groups who wish to automatize instance or semantic segmentation, and nuclear detection in particular. • The dataset is sufficiently large to be used as sole training and benchmarking dataset.
It can also be combined with other annotated datasets to improve and evaluate generalization of instance and semantic segmentation models and algorithms. Examples of complementary annotated fluorescence datasets are the BBBC039 dataset [1] (available from https://bbbc.broadinstitute.org/BBBC039/ ; accessed on Oct 28, 2022), which contains annotated fluorescence microscopy images of Hoechst-stained nuclei from cells treated with different chemical compounds and the BitDepth dataset [2] (available from https: //github.com/masih4/BitDepth _ NucSeg ; accessed on Oct 28, 2022), which contains annotated fluorescence microscopy images of DAPI-stained nuclei from tissue sections. Examples of complementary datasets with annotated hematoxylin and eosin-stained nuclei can be found in a recent article by Mahbod et al. [3] . • As annotated datasets are extremely time-consuming to generate, few have been released so far, making the current dataset very valuable to the research community. • The quality of the dataset is especially high as it has been annotated with the help of a senior biomedical researcher.

Objective
Nuclear detection is typically the first step in fluorescence microscopy image analysis, e.g. when counting cells, assessing protein levels and localization. To obtain high throughput and improve reproducibility this task needs to be automated with deep learning models or other algorithms that perform instance or semantic segmentation. To develop and evaluate such models or algorithms high quality manual annotations ("ground truth") are essential. Only a few such annotated datasets have been produced and many of these show nuclei in tissue sections and/or visualized with hematoxylin and eosin staining. To complement the existing data, and facilitate the training and evaluation of deep learning models and algorithms for cell culture-based genetic and drug screens in particular, we annotated images of Hoechst 33342-stained nuclei from cultured U2OS osteosarcoma cells. Annotations were created with the help of a senior biomedical researcher to ensure high quality and avoid smaller nuclear structures being overlooked.

Experimental Design, Materials and Methods
Modified U2OS cells were plated in black clear-bottom 384-well plates (Greiner) and transfected with siRNAs (Dharmacon siGENOME library), which were washed away the next day. After 72 h, cells were fixed and stained simultaneously with Hoechst 33342. Plates were stored sealed at 4 degrees Celsius until imaging in a CX7 high-content imaging system (Thermo Fisher). For each well, 16 images were acquired in a non-overlapping grid at 20x magnification in the blue fluorescence channel using the microscope-associated HCS Studio software. 50 images, derived from multiple 384-well plates, were randomly chosen for annotation. Images were exported in the microscope-generated .C01 format. Prior to annotation, .C01 images were normalized and transformed to 8-bit png images using the C01_to_png.py script (available from https://github.com/Aitslab/bioimaging/tree/main/C01 _ conversion ) as follows: Nuclei, nuclear fragments and micronuclei visible in the png images were annotated as a single class with the polygon tool of the CVAT annotation software ( https://github.com/ openvinotoolkit/cvat ). Annotations were made by a single trained researcher (MA) and doublechecked by a senior biomedical expert (SA), after which small corrections were made in some images. Annotations were saved as 24-bit rgb png image with each nuclear object filled in with a randomly assigned different color.

Ethics statements
A commercially available human cancer cell line was used for these studies. Therefore, no ethical permits were required.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
An annotated high-content fluorescence microscopy dataset with Hoechst 33342-stained nucl ei and manually labelled outlines (Original data) (zenodo).