galmask: A Python package for unsupervised galaxy masking

Galaxy morphological classification is a fundamental aspect of galaxy formation and evolution studies. Various machine learning tools have been developed for automated pipeline analysis of large-scale surveys, enabling a fast search for objects of interest. However, crowded regions in the image may pose a challenge as they can lead to bias in the learning algorithm. In this Research Note, we present galmask, an open-source package for unsupervised galaxy masking to isolate the central object of interest in the image. galmask is written in Python and can be installed from PyPI via the pip command.


INTRODUCTION
A galaxy's morphology encodes valuable information about the underlying physical processes driving their formation and evolution (van der Wel 2008; Lee et al. 2013;Conselice 2014;Fang et al. 2017). It correlates with physical properties, such as environmental density, merger history, and star formation rate. The first step in deriving the morphology of a galaxy in large-scale surveys and crowded regions is to isolate it from sources in the same field. Thus, there is considerable interest in generating automated astronomical source detection and segmentation tools (Bertin & Arnouts 1996;Akhlaghi & Ichikawa 2015) scalable to large-scale surveys. Examples include Morpheus (Hausen & Robertson 2020), a deep learning approach for pixel-level classification using semantic segmentation that requires training the model with user-specified segmentation maps, and galclean (de Albernaz Ferreira & Ferrari 2018), which was designed to remove bright sources around a central galaxy by generating a non-target segmentation map and replacing the non-galaxy regions with a median background estimate. Also, Farias et al. (2020) developed an automated machine learning pipeline to perform detection, segmentation, and morphological classification of galaxies. A traditional machine learning classification scheme is usually designed to provide a single class per image. The learning method might be affected by neighboring objects such as stars and other galaxies in the same field. galmask is a general-purpose package to isolate the central object and remove unwanted detections. The only assumption is that the object of interest is placed near the center of the image.

DATA
To showcase the capabilities of our method, we apply it to the crowded field around the galaxy SDSS J095734.63+033901.7 taken with the Hyper Suprime-Cam (HSC) on the 8.2-m Subaru Telescope. We downloaded stamps in the g, r and i-bands in fits format from hscMap 1 . The galaxy is located in the COSMOS field and the region of the CAMIRA cluster HSCJ095728+033956 (Oguri et al. 2018) at z ∼ 0.16. The equatorial coordinates of the galaxy are RA=09:57:34.6507 and DEC=+03:39:01.9612.

RESULTS AND CONCLUSIONS
The default configuration of galmask employs three main steps for the galaxy masking process. Firstly, it receives the original galaxy image, and convolves it with either a user-specified kernel or a normalized 2D Gaussian kernel with full width at half maximum, FWHM = 3. Then it estimates an initial segmentation map using the photutils (Bradley et al. 2016) library by selecting sources above a user-specified sigma threshold level.
The second step is deblending on the segmentation map to remove outliers and delineate the region around the central galaxy. For this, we use the deblend sources method from photutils that uses multi-thresholding and watershed segmentation to deblend nearby or overlapping sources. We empirically found deblending unnecessary in simple cases, and hence we keep this step optional. Local peaks, i.e., local maxima, are searched in each distinct labeled region of the partially-cleaned segmentation map from previous steps using the peak local max method from the scikit-image library (van der Walt et al. 2014).
The third and final step employs connected-component labeling (CCL; Samet & Tamminen 1988;Dillencourt & Samet 1992) to isolate the central object. CCL is an algorithm to label each connected component in an image. Let S be a subset region of the original image. S is a connected component if, for each pixel in S, there is a path to any other pixel in S consisting of pixels that only belong to S. Our CCL implementation makes use of the opencv-python (Bradski 2000) library. We also ensure that the central galaxy is not masked during this step if the background sources are area-wise larger than the central galaxy or if they dominate the image region. Finally, we replace all non-source pixels with zero values in the final output image. Figure 1 shows the application of galmask to the region of the CAMIRA cluster HSCJ095728+033956. The panels depict different steps of the analysis. The original input image is displayed on the leftmost panel. The first segmentation, using sigma clipping, is depicted in the second panel, and the final mask of the central objects via CLL appears in the third panel. Furthermore, the rightmost panel provides a visualization of the masked galaxy image. galmask successfully removes the background sources surrounding the central galaxy and nearby sources. RSS thanks the National Natural Science Foundation of China, grant E045191001. ACS thanks the Chinese Academy of Sciences (CAS) President's International Fellowship Initiative (PIFI) through grant E085201009 and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) through grant CNPq-314301/2021-6. We thank Carolina Queiroz de Abreu Silva for useful discussions. This research made use of the Photutils and Astropy packages for detection and photometry of astronomical sources.