Assessment of Astronomical Images Using Combined Machine-learning Models

H. Teimoorinia; J. J. Kavelaars; S. D. J. Gwyn; D. Durand; K. Rolston; A. Ouellette

doi:10.3847/1538-3881/ab7938

1. Introduction

Ground-based observational data necessarily varies in quality. Information regarding celestial objects is collected using a wide array of telescopes and instruments, often gathered and stored in a variety of different formats, such as astronomical images. The number of observations and variety of instrumentation create multiplicative complexity in astronomical data, including imaging. Different data sets are targeted for various science projects, and the impact and accuracy of the projects are directly linked to the quality of the data sets. On the other hand, a high-quality data set can be very complicated, with multidimensional characteristics. In general, complex problems need more complicated methods (or a set of well-arranged methods) to be solved. Images captured by a ground-based telescope, for example, are typically complex in terms of the range of objects they detect. The degree of complexity can also be highlighted when we take different observational conditions into account. Such conditions can, in turn, cover a broad range of different phenomena: from low-quality imaging caused by turbulent air and poor weather conditions, to images that suffer from transient detector-noise issues. The variations in image characteristics impact the utility of the data. Low-quality observations limit the confidence in science projects that use them, which can restrict reliance on such projects, making them less productive.

Classical methods and pipelines classify astronomical images based on deterministic techniques. Different surveys such as SDSS, PanSTARRS, and 2MASS have various ways of assessing image quality, based on the characteristics of the surveys. For example, quality assurance employed by 2MASS (Skrutskie et al. 2006) verifies the pipeline data quality with an automated software system which can determine problems such as telescope tracking and poor atmospheric conditions. The SDSS survey tracks a number of flags and quality parameters such as clear/cloudy, noise, and PSFWIDTH. The latter is the FWHM of the point-spread function (PSF) in arcseconds, the median of which, in the r band, is 1 farcs 3 ± 0 farcs 2. A seeing of 2'' is rare and may be a criterion for flagging an image. The problem of poor telescope tracking can also be registered by the initial processing of the data itself (Ivezic et al. 2004). As another example, the Phase 2 Proposal Preparation (P2PP) Tool, which the European Southern Observatory uses, measures the minimum average seeing of an observation, and if the average PSF width is larger than a requested absolute value, the observation may be repeated.

Classical and deterministic methods in surveys are usually cheap to run. For example, they can inspect some parameters such as ellipticity and FWHM, and provide results that are easy to interpret. However, combining many different settings to make a decision based on the overall pattern in a survey needs more sophisticated methods (Teimoorinia et al. 2017). Generally, classical methods are not able to combine the distribution of different parameters of a data set to make a decision based on probabilistic models. In other words, in the conventional procedure, deciding (for example) whether an image is good or bad is independent of the quality of the rest of the images.

Machine learning (ML) algorithms and methods (such as artificial neural networks, which can range from shallow to deep models) have been shown to handle complex astronomical problems (Teimoorinia 2012; Bilicki et al. 2014; Teimoorinia & Ellison 2014; Aghanim et al. 2015; Ellison et al. 2016; Teimoorinia et al. 2016; Teimoorinia & Keown 2018; Ucci et al. 2018). They can predict different physical parameters and detect different patterns in different surveys (Rafieferantsoa et al. 2018) or recognize various interesting astronomical sources, such as strong lensing candidates (Jacobs et al. 2017; Lanusse et al. 2018; Pourrahmani et al. 2018). Deep-learning methods are also used for exoplanet transit classification (Ansdell et al. 2018), and the nonparametric method Random Forest has been used to discover new nipper stars (Hedges et al. 2018). ML approaches have also been used to measure the correlations between different galaxy structure parameters, revealing potentially serious problems in contemporary models of galaxy evolution (Bluck et al. 2019). ML is finding an ever increasing role in astronomical data analysis.

There are various methods in ML; however, most break into two main approaches: "supervised" and "unsupervised." In a supervised method, we know what we want (the target, or "true" answer), and our goal is a model that can achieve results as close as possible to the "true" answer. In contrast, an unsupervised method has only the input data and allows exploration of that data. One approach is to find clusters within the data by the construction of self-organizing maps (SOMs; Kohonen 1982, 2001). In this method, the aim is to reduce the dimensionality of a data set and inspect the relationships between clusters of data points found in the set (e.g., Rahmani et al. 2018). Combining the two methods, supervised and unsupervised, can further increase the power of the ML approach.

There are many cases in ML where we can use our knowledge about a data set (feature-engineering processes) to make an algorithm work better. Other accepted methods (mostly, deep-learning ones) use less-engineered features as input. In both cases, however, a data set under study should be presented to a model in such a way as to make the selected model's job easier. Generally, an astronomical image can be complex, and we cannot expect an ML model to learn directly from such complex data. Besides, a wide-field telescopic image, where a considerable fraction of the pixels contains empty sky, is not a good candidate as an input because the information density is quite low, although the sky by itself does contain some information of limited value. The process we present here extracts the most valuable image-quality classification features from our wide-field imaging data set.

In ML, one can also combine two fundamental approaches: supervised and unsupervised. Deep clustering is an approach that combines techniques and has seen significant development in computer vision applications (Caron et al. 2018). For example, Xie et al. (2016) use an unsupervised system which itself consists of two unsupervised methods. This method (which is called deep embedded clustering) is a combination of a K-means algorithm and a deep Autoencoder, in which the latter can consist of different Convolutional Neural Network (CNN) layers (for deep clustering methods, also see Aljalbout et al. 2018). In this work, we implement a model that combines an unsupervised technique with a supervised deep model to solve a complex problem in astronomical imaging³ : automated quality assurance.

An important topic in ML research is to improve the performance of the models used. The major contribution of this work is to present a novel and creative dimensionality reduction step in which image pixel data can be fed into a deep-learning model in an effective way. In this way, we show that combining suitable models and data sets can significantly improve the overall performance of the methods used individually, both in accuracy and speed.

To demonstrate our approach, we have selected an image-quality assessment problem in which the ultimate goal is to evaluate images and separate high- from low-quality images (i.e., those that are usable for science projects from those that are not). The applicability of the method, however, is not restricted to assessing image quality. Here, we do not deal with a binary problem (acceptable/unacceptable) but a multiclass classification problem with different categories of "bad" (low-quality) images. We will show that a combination of different ML methods allows an automated solution to this complex problem.

High-quality astronomical images enable researchers to achieve more impactful outcomes. For example, Megacam images are of limited use individually and combining multiple exposures, on the same field, increases the sensitivity and reduces image defects. These "stacked" images provide more suitable data for broad classes of research. Our method enables assessment of the individual Megacam images so that the highest quality inputs can be selected for stacking. The Canadian Astronomy Data Centre (CADC) uses the output of the combined model presented in this paper as the input to MegaPipe (Gwyn 2008).

In Section 2, we describe the data chosen in this work, and in Section 3, we explain the method and combine two ML methods. Section 4 presents some results of our image classification. In Section 5, we present a short discussion, and we compare our results with those using classical methods in Section 6. The summary will be presented in Section 7.

2. Data

For this analysis, we explore our assessment method using selected images from the MegaCam instrument mounted on the Canada–France–Hawaii Telescope (CFHT; Boulade et al. 2003). MegaCam is an optical imaging mosaic CCD camera consisting of 40 (2048 × 4612 pixels) CCDs/images, with each pixel imaging ∼0 farcs 184 region of sky. An individual exposure is stored in a single multi-extension FITS file of approximately 800 MB in size. During initial operation, each MegaCam image consisted of 36 CCDs, 4 of which are vignetted and thus not recorded. Recently, however, a new set of camera filters has expanded the unvignetted field of the camera, and all 40 CCDs are recorded for each image. An example 36 CCD exposure is shown in Figure 1. Optical images taken by ground-based telescopes, such as MegaCam, exhibit a wide range of image quality. Low-quality images contain problems regarding telescope tracking (from slight to severe), poor sky conditions (e.g., poor visibility; cloudy conditions), and different background issues, such as pronounced background fluctuations, severe object saturation, and non-astrophysical background patterns. Our goal is to develop an ML-based method that automatically ranks or groups images based on quality characteristics.

**Figure 1.** The top plot shows an example of a typical MegaCam exposure consisting of 36 CCDs. The white section of the image represents a non-functioning CCD caused by sporadic electronic failures. The bottom plot shows one of the 36 CCDs (the upper-right one; 4644 × 2112 pixels) in a larger view.
Download figure:
Standard image High-resolution image

The MegaPipe processing system (Gwyn 2008) at CADC calibrates and combines CFHT MegaCam images that are first passed through a visual inspection to classify them into different image-quality-based categories. The visual classification categories are "good" really bad tracking (RBT), bad tracking (BT), various problems in the background (BGP), and poor visibility during observation (B-Seeing). Besides, during MegaCam observations, there were times when not all of the CCDs in the mosaic were functioning; these exposures have a secondary quality value of "Dead-CCD" (see Figure 1). Over 5000 MegaCam exposures (i.e., ∼5000 × 36 CCDs) have been visually assessed and placed into one of the categories above. This initial quality assessment allows us to use human experience to build models that will enable our ML-based process to place images into the classes mentioned above.

To generate a suitable training set, we randomly selected some exposures from all the available classes and reexamined all the images to make sure we had correct labels in our training set. Some examples of all the classes are presented in Figure 2. To see the detail of exposure we show only a small (magnified) part of a single CCD. As can be seen from Figure 1, each exposure contains a large quantity of information. One of our goals is to introduce a method where only a small fraction of the complete image is directly input into our ML algorithm.

**Figure 2.** Five different targets for our models in this paper. They include images with different problems in the background (BGP), bad tracking (BT), really bad tracking (RBT), bad seeing or bad observational conditions (B-Seeing), and an instance of a good image at the bottom of the figure.
Download figure:
Standard image High-resolution image

3. The Method

In ML approaches, data should be presented in an appropriate way for the model. The presentation method depends on the nature of the problem under study. As mentioned in Section 2, CCD pixel values (from an exposure) can require significant memory resources to examine. Feeding many hundreds of such large exposures into a deep-learning model would be computationally expensive and may not be the optimal analysis approach. As an example, one way to avoid expensive computations could be to cut 20 small images randomly out from a CCD (as representative images) and then feed them into a network. In this way, there is no guarantee of having different sources with different characteristics in the small cutout images. In a visual inspection, we usually search for various objects in an image to assess its quality. In other words, the random method can ignore (or over-highlight) the real character and nature of the image. Here, we present a method to extract a small, representative example of an image in such a way that the useful information from the larger, more complex image will generally be preserved. A clustering method can help to render a fair representation of an image.

3.1. The Representative Images

First, we use SOURCE EXTRACTOR (SE; Bertin & Arnouts 1996) to detect astronomical sources within the images and determine parametric quantities using the pixels associated with each source recognized. In our SE analysis, the following parameters are returned: ISO0 (Isophotal area at level 0), ELLIPTICITY (1—B_IMAGE/A_IMAGE), BACKGROUND (background at centroid position), and CLASS_STAR (star/galaxy classifier output). We also record the image header keyword EXPTIME as an additional input to train our model. SE measurement catalogs were created for a randomly selected set of training exposures chosen from each of the classes defined in Section 2. Each of the individual SE catalogs was then concatenated to form a global source catalog. Using ∼3,000,000 randomly selected sources from SE produced the global source catalog we use to train an SOM (which is a kind of artificial neural network). The SOM of the SE output parameter measurements form the initial step of our classification process.

An SOM places each source into a two-dimensional (N × N) map based on similarities or dissimilarities to other sources in the map while attempting to preserve the topological properties of the input space by using a suitable neighborhood function. The two-dimensional map contains a grid of N_u (=N × N) units (or nodes) in which each unit is represented by a d-dimensional vector unit, ${{\boldsymbol{U}}}_{i}$ . The units on a map can have rectangular or hexagonal forms in which the latter can create more connections with adjacent units (e.g., the top plot of Figure 3; N_u = 15 × 15). First, vector units are randomly initialized. Then, an input datum ${\boldsymbol{X}}$ (which is also a vector with dimension d; here d = 5) is compared (e.g., by a Euclidean function) to all the units on the map. The algorithm is looking for the best matching unit (BMU) for the input, which is obtained by finding the minimum value of $\parallel {{\boldsymbol{U}}}_{i}-{\boldsymbol{X}}\parallel$ (i = 1 to N_u). So, an input vector can find its BMU on a map with N_u units. In the first iteration (t = 1), this method is repeated for all input vectors, X_k (here; k = 1 to 3,000,000). So, all 3,000,000 sources can be "distributed" (not necessarily uniformly) on a map of size N × N.

**Figure 3.** The top plot shows a hit map. Here, 3,000,000 sources, with five parameters, are used to train the map (by the SOM algorithm) and then the sources can be arranged and distributed among the 225 nodes. In this way we train an SOM network. For example, cluster 10 contains 50,351 + 42,719 sources. After the training step, the 225 nodes are clustered into 20 clusters (by the K-means method of SOMPY) and given labels from 0 to 19. Each new input image source list is then distributed into these clusters.
Download figure:
Standard image High-resolution image

The next step is to update the vector units. This step can be done by

$\begin{eqnarray}&&{U}_{i}(t+1)={U}_{i}(t)+\alpha \,{f}_{{bi}}\times (X-{U}_{i}).\end{eqnarray} \tag{ 1 }$

In Equation (1), α is the learning rate and f_bi is the neighborhood function that is concentrated on the BMU and gives the distance between the BMU and neighboring units. A Gaussian neighborhood function is defined by

$\begin{eqnarray}&&{f}_{{bi}}={e}^{-{\parallel {r}_{b}-{r}_{i}\parallel }^{2}/2{\sigma }^{2}}.\end{eqnarray} \tag{ 2 }$

In Equation (2), σ is the neighborhood radius that can be chosen by the user (usually as a fraction of N, here σ = 3). With an error function (which can be defined as a function of multiplication of the Euclidean distance and the neighboring function; e.g., Kohonen 1991) and after several iterations, the sources are distributed on the (trained) map. In this work, we have chosen N = 15. So, after training the network, we have a (15 × 15) map in which each unit contains similar objects with specific characteristics. When the training step is finished, the number of similar instances in each unit can be counted. The top plot of Figure 3 shows the distribution of the 3,000,000 sources on the map (i.e., the hit map). Such a map generally does not have specific axis labels (because it is created by dimensionality reduction methods). For example, in this instance of the SOM, the node at the top right of the grid contains 14,388 sources with similar characteristics based on the four selected SE parameters plus the exposure time of the image.

Different, but connected, nodes in the SOM contain sources with more or less similar characteristics. The goal of the SOM network is to cluster sources together based on similarities in the provided features. When the size of a map is large, related units need to be grouped (Vesanto & Alhoniemi 2000). To implement the SOM algorithm, we use SOMPY (Moosavi et al. 2018) in which clustering is done in two separate steps. First, the input data are distributed (on a two-dimensional map) by the SOM (i.e., the dimensionality reduction round). Next, (i.e., after the SOM is trained), the SOM is clustered. Different methods can be considered for implementing the second step, such as hierarchical agglomerative clustering and partitive clustering using K means (Vesanto & Alhoniemi 2000). SOMPY uses a K-means algorithm in which the SOM can be clustered into n_c different clusters. In a 15 × 15 map, for example, n_c can be selected to be a number between 1 and 225. SOMPY also uses a batch version of the algorithm where the learning rate is not used (e.g., Kohonen 1995). In this work, each iteration takes ∼13 s to finish (with a 2.8 GHz Intel Core i7 computer). After 200 iterations, no improvement in the error function is seen, and the model is ready to use. The response of the trained model to new data is rapid.

It should be mentioned that the K-means clustering algorithm could be applied directly to the SE output to cluster sources into n_c categories. However, it does not take any topological information to arrange the data. SOM is a more complicated and robust method than K means. An SOM, in the limit, can turn to the K-means algorithm (e.g., Rahmani et al. 2018). After we arrange the 3,000,000 sources on a map by SOM, then K means can be an effective method to group the nodes on the map.

The SOM clusters can be used to group sources into classifications. One cluster might contain point sources while another contains extended sources, for example. Between those clusters, there can be intermediate cases. In this way, we can create a classification space into which any new source can be mapped (based on the values of the measured parameters). Within this space, each image will have a different density distribution. An image of mostly star-shaped objects, for example, will have a high density of sources in a different area of the SOM compared to an image containing mostly extended objects. In a 15 × 15 map with 3,000,000 instances, we have more than (on average) 13,000 sources per unit, which is a sizable number considering the five parameters taken from SE. In other words, we train our network with a large random sample of all available data; the space is considered to be "complete"; and we can map data from a new image onto this space.

Using the trained map, we take catalogs containing the same parameters measured for sources in a new image (obtained by SE) and distribute those sources onto the trained network. In this way, even without seeing the image (with an examination of the hit map and properties of the clusters; see Section 3.2), we can find the image characteristics (rich in stars or perhaps containing many galaxy clusters, for example) based on the distribution of sources on the map. In other words, there are patterns in the maps which we can use to recognize the image content. Although this is not the explicit goal of the current investigation, the patterns in maps can then be used to search for images with particular kinds of content, without reference to specific catalogs of such sources or knowledge of the sky location contained in the image. The map places sources from a given image into groups with similar source-parameter values, and those sources are then categorized based on which cluster they appear in.

We have chosen to have 20 clusters within our SOM (i.e., n_c = 20), with a map of size 15 × 15. In the bottom plot of Figure 3, we show the same map that is grouped into 20 clusters. Each detected source of the new image is placed into a particular cluster. The shape of the distribution (how many sources of the cluster into which groups) depends on the nature of the image under study. In the next step, we randomly select a source from each of the 20 SOM clusters as a representative source of that cluster for the given image. We then extract pixel values for those representative sources (image subsets); we refer to this as a "cutout" of the source. In this way, we have 20 cutouts for each input image. Using the hit map, we also can count the number of sources a particular image has in each cluster of the SOM. Using the 20 cutout image subsets, we create a "representative" image, and the number of sources from the image contained in a given SOM cluster is the weight assigned to the particular image cutout. The SOM provides a method for mapping the content of the image and allowing significant compression without losing information concerning the global quality of the image data.

The size of the map and the number of clusters in a map are free parameters. We have used a 15 × 15 map showing 20 clusters (i.e., Figure 3). In this way, by choosing a 15 × 15 map (i.e., a higher resolution than a 5 × 5 map, for example), we find a more complex structured map; the 3,000,000 sources have more phase space to be arranged on the map. In all cases, however, the same sources were used to train the map with the same five parameters as input. Selection of the map resolution and the number of clusters is done through an iterative process to achieve the desired classification, as determined by the end goal of the process. In this case, finding a model that accurately predicts the "quality" of the image was the goal that harmonized the free parameters of map size and the number of clusters. As was mentioned (in a map with 15 × 15 nodes), n_c, a number between 1 and 225, can be selected. The bigger the number of clusters, the better representative an image will be; however, more memory, a longer image processing time, and stronger computational sources would be needed.

Figure 4 shows representative sources from each of three different images in the different classes (from the training set). Each representative image contains 25 × 25 pixel cutouts (postage stamps) from the 20 different clusters shown in Figure 3 (numbered from 0 to 19). On the top of each postage stamp (here 25 × 25 pixels), the number of similar sources in the corresponding cluster can be seen. For example, for the "good" image in the middle panel, the cluster with label 19 contains 103 similar sources. So, for each image, we have created a representative image, with size 20 × (25 × 25) pixels, as the main input for a deep-learning model. We use the set of numbers in the hit map of each image as auxiliary data to train our image characterization model.

3.2. The Properties of the Clusters

In this section, we present quantitative plots of the average value of the SE parameters in each cluster (see Figure 5). As mentioned in previous sections, each cluster in the bottom plot of Figure 3 contains similar sources. From the top-left plot of Figure 5, we can see that, for example, clusters 2 and 10 contain, respectively, the smallest and largest sources in terms of their ISO0 values. In the plot related to EXPTIME, we have set all values of EXPTIME above 30 equal to 30 (see Section 5 for more details). In this way, we highlight low EXPTIME sources. For example, we can see that cluster 9 contains sources with low exposure time and also with relatively high background fluctuation (BGF). Again, cluster 2 contains sources with a small ISO area, low ellipticity, EXPTIME ≥ 30, and low BGF (which are detected as "stars" by SE).

By examining the representative cutout images for each SOM cluster, for given images, we find patterns that relate to image quality. For example, in Figure 6, we present representative image cutouts for each SOM cluster (columns) for each of the 36 camera CCDs (rows) for different "good" exposures. In Figure 7, we present an RBT exposure. Different patterns can be seen. For example, cluster 9 is empty in both exposures because they are high-exposure instantiates, and cluster 9 contains sources measured in exposures with low exposure times. Another example, cluster 3 (which contains low-ellipticity sources and relatively high ISO values) in the RBT exposure is nearly empty (RBT exposures generally have high ellipticity). While for the good exposure, cluster 3 is well populated. These representative images are ∼800 times smaller than the main exposures as an input to a model, allowing rapid model training on low-memory machines. Aside from this pixel information of different exposures, we also have statistical information describing the relative population statistics for each cluster in the 15 × 15 SOM. Plots such as those shown in Figures 6 and 7 (along with the statistical information) act as a "fingerprint," and such patterns can be found in different exposures taken in different conditions, which allow the quality assessment to occur. We will train a (combined) deep network to explore these patterns and predict the quality of the image. We will then use these sets as a combined input to a combined model.

**Figure 6.** The plot shows 36 representative images (CCDs) from one exposure (on top of each other), each with 20 cutout images (25 × 25 pixels) taken from the 20 clusters shown in Figure 3 (with labels 0–19). The plot is related to a "good" exposure (∼800 times smaller than the exposure). For example, cluster 3, which contains relatively large sources with low ellipticity and a relatively high value of ISO0 (see also Figure 5), is completely populated. However, for an RBT image (see Figure 7), the cluster should be empty. As another example, there is no source for the images in cluster 9 in Figures 6 and 7, because both had high exposure times.
Download figure:
Standard image High-resolution image

**Figure 7.** The same as Figure 6 for an RBT image.
Download figure:
Standard image High-resolution image

3.3. The Combined Model

Here we show that a combination of different ML methods can improve total performance while speeding up the learning and validation steps. We combine an SOM with a deep-learning model to classify images into five quality groups. We find that an SOM provides a method to select representative data as suitable inputs into a deep-learning model. Our goal is to provide a "big picture" view of the data rather than explain fine-grained and technical aspects. The intent of this manuscript is not to present a detailed code that can be broadly distributed; instead, the goal is to show that combining different ML models as well as available and relevant information can dramatically improve the performance and accuracy for the problem under study. The way of combining the information may depend on the nature of the problem under study. Here, we present an example of image classification in astronomy. Deep-learning methods are being widely used in different areas (Goodfellow et al. 2016), and the use of an SOM to organize the selection of classes of input is likely generalizable to those problems.

There are a variety of different deep-learning models publicly available. We use Keras,⁴ which is a high-level neural network API and allows users to build complex CNN models quickly. In the previous section, we have described the unsupervised part of our model (SOM); here, we briefly describe the CNN part.

A CNN model can consist of different layers and take a two-dimensional image input. In each layer, an image is convolved with a set of two-dimensional filters (usually the size of a few pixels, e.g., 3 × 3). The weights associated with the filters are randomly initialized. The user can select the number of filters in each layer. So, for N_f selected filters in the first layer, we will have N_f convolved images. These images are called feature maps. A pooling function can downsize the feature maps in the first layer. This step reduces the size of the feature maps by taking the maximum value of four adjacent pixels. So, a 2 × 2 max-pooling function, for example, can reduce the number of pixels in a feature map by a factor of 4. This procedure is essential to avoid using too much system memory. The (reduced) feature maps made in the first layer will become input images for the second layer. In this way, useful information from the image can be extracted using multiple network layers. In this work, we use five CNN layers (see model M1 in the top plot of Figure 9). The output of the last layer will be fed into a fully connected layer. However, before this step, all of the two-dimensional image pixels in the last layer should be converted into a one-dimensional (flattened) array. Only flattened data can be fed into a fully connected model. Then, the model can classify the data. In a fully connected layer, each neuron in one layer is connected to all neurons in the next. In a neural network system, a neuron receives input from M other units (i.e., from the previous layer). The output of a simple neuron (e.g., in a layer of the fully connected part) is

$\begin{eqnarray}&&{\mathrm{output}}_{i}=F\left({b}_{i}\ +\ \sum _{j=1}^{M}{w}_{{ij}}\times {\mathrm{input}}_{j}\right).\end{eqnarray} \tag{ 3 }$

In Equation (3), w_ij and b_i are trainable parameters, and F is an activation function. The function, which has a nonlinear characteristic, helps to capture the nonlinearity of the problem under study. In the CNN and the fully connected layers in this work, we use the activation function "relu" (a rectified linear activation function; e.g., Nwankpa et al. 2018). However, the last layer of the fully connected layer should be the softmax activation function. This function can convert an n-dimensional vector of the final layer (here, n = 5; i.e., the number of the classes) to a probability distribution so that the summation of the probabilities is 1. In other words, the output of the fully connected part is passed to the softmax function to produce the five classification probabilities. Between the layers, to avoid overfitting problems, one can also use a different regularization function. We use a weight-decay regularization (i.e., an L2 regularization; see chapter 3 of Bishop 2006). A more detailed discussion about deep-learning and SOM methods can be found in the references presented in Section 1.

Figure 8 presents a schematic view of our ML mode. First, we use the SE program to extract useful information concerning the "sources" detected in the image (the left side of Figure 8); this provides parametric measurements for the sources. This information is fed to an SOM that has been trained using a big sample of the detected sources drawn from a selection of images of different quality classes: here, 3,000,000 sources. The SOM model places the sources into N clusters. The number of clusters is an option of the analysis; we found N = 20 provided an effective end result. We then select one representative source from each cluster in the SOM. Using the original image pixels associated with each of these sources, we construct a representative image (i.e., an image built by connecting together the 20 subimages, each centered on the representative source for that cluster). The representative image (pixel information) is denoted as Input-1 in the figure. We also have available statistical information from the SOM (the number of similar objects in different clusters). This statistical information is additional information that we provide to the deep model (Input-2). The target of the deep model is the five different probabilities described in Section 2. Different models can be explored in the deep-learning part of the process. Here, we examine three different deep models: M1 accepts Input-1 only, M2 takes Input-2 only, and M3, which accepts both Input-1 and Input-2.

4. The Results

In this section, we use the three models discussed in the last section (i.e., M1, M2, and M3) and compare the associated performances. Some examples of the predictions of the image classification are also presented.

In the top plot of Figure 9, we show the three different models that can be used, separately, to classify the images. In the top model (that denoted by M1), the representative image (Input-1) is provided to a five-layer CNN. Then, after converting to flat data, we use a fully connected layer to obtain the five probabilities. In the middle, M2, a deep model (only the fully connected layer) driven off Input-2, is shown. M2 gives the probabilities for each class directly from the distributions of the measured SE parameters (i.e., the statistical information from the SOM). Using M3, we combine the output of CNN with Input-2 (which can be represented as flat data), and the merged data set is fed to the fully connected layer to obtain the probabilities. The three different models allow us to examine the influence or importance of the two types of content (parameter distribution and pixel values) for the end result. To select a suitable model (of all the three models presented in Figure 9), we train different models with different powers (more/fewer layers and more/fewer neurons in each layer) to make sure that we obtain maximum accuracy/performance for the validation set. Adding more complexity/layers to model M2, for example, does not change the accuracy and can present overfitting.

To examine the efficacy of our models, we use separate training and validation data sets. We provide the sources, lists, and subimages for 60,000 preclassified images (selected to span the range of quality classes) as training and validation data sets. We use 70% of the set for training and the remainder for validation. Figure 9 presents the performance (for both training and validation sets) of the three models of M1, M2, and M3, described in Figure 8. For M1, the maximum accuracy of the validation set reaches ∼92%; for M2, the peak validation is similar; while for M3 (the combined model), the peak validation is significantly higher at 97%. Considering the blue dashed line, M1, we see that there is a significant overfitting effect after ∼11 epochs (the performance of the training set increases; however, there is no improvement for the validation set). The performance characteristics for M2 (the red lines) are different from M1, in which there is no significant overfitting regarding this model. However, the performance is the same as M1 (after ∼13 epochs the accuracy is ∼92%). For M3, there continues to be improvement through to epoch 14—more than 97% for epoch 14. After epoch 14, the performance of the network does not significantly change, and there is no strong evidence of overfitting. (The validation and training sets track similar success rates.) M3 provides a superior quality classification—the goal of our model.

The two sets of input data (i.e., Input-1 and Input-2 for M1 and M2) are informative and provide reasonable learning performance. As stated earlier, by itself the statistical information distributed on the SOM network has patterns that can be distinguished by a deep model. These patterns provided a classification accuracy of ∼92%. However, a combination of the input information significantly increased performance. In this way, the information is mixed and enables more distinct patterns to be discovered, providing the remarkable improvement seen in M3 over M1 and M2. Besides, M3 provides smooth learning behavior when compared to M1 and M2. We also see significantly better performance in epoch 1, which is a sign of better and more relevant information being provided to the deep model.

It should be noted that the deep model (in M3) does not use any direct information from SE, such as ellipticity or ISO. The SOM presents suitable coordinates (R.A. and decl. in Figure 8) to assemble the 20 small, cutout images, along with associated statistical information as the auxiliary data. Then, we leave it to the CNN part to detect significant elongation or observation conditions, for example.

The stated performance of M3 is related to the validation set; to check this performance more robustly, we randomly selected different test sets and conducted a meticulous examination of the input images. These sets had not already been seen by the network (as training or validation sets), so they can provide a more robust proof than the 30% validation set. In this respect, more than 1500 exposures (i.e., more than 54,000 subimages) were examined and then compared with the classification probabilities predicted by M3. Here we obtained accuracy even greater than 97%, and we suspect that some of the images within our extensive training and validation sets may have been misclassified during the initial inspection, which involved the visual classification of over 100,000 images, due to fatigue during the classification effort.

4.1. Examples Classifications

Using deep model M3, we classified over 220,000 exposures (more than ∼8,000,000 images) in less than one day of computation. At no time did the performance of the process decay due to fatigue! Below we present some examples of these quality-classified images. We present a table of some examples (exposures) with five different probabilities (and a column of the number of dead CDDs, which might be found in an exposure) predicted by our method. For example, the first item of Table 1 (ID = 1021182) shows that the exposure is a good one with more than 99% probability.⁵ To illustrate the detail, only a part of this exposure is shown in the top left of Figure 10. In the top-right plot, we see an RBT exposure (1635753) with probability more than 94%. The RBT's component of this exposure is powerful and completely overshadows the other (good, BT, B-Seeing, BGP) components. ID = 1850900 shows an exposure with BT, ID = 1143261 is related to a B-Seeing image, and ID = 731965 is an exposure with a problem in the background. The last ID (1853401), in the table, shows a mixed case. This exposure is a combination of BT and B-Seeing.

**Figure 10.** The images that are listed in Table 1, with associated IDs: 1021182==good, 1635753==RBT, 1850900==BT, 1143261==B-Seeing, 731965==BGP, 1853401==a combination of BT and B-Seeing.
Download figure:
Standard image High-resolution image

Table 1. A Sample of Probabilities Predicted by the Method Used in This Paper

ID	Good	RBT	BT	B-Seeing	BGP	${N}_{\mathrm{Dead}\_\mathrm{CCD}}$
1021182	0.999	0.000	0.000	0.000	0.001	0
1635753	0.002	0.941	0.000	0.001	0.057	0
1850900	0.141	0.016	0.843	0.001	0.000	0
1143261	0.003	0.002	0.000	0.972	0.023	0
731965	0.000	0.002	0.000	0.001	0.997	1
1853401	0.065	0.000	0.260	0.673	0.002	0

Download table as: ASCII Typeset image

5. Discussion

To increase the density of useful pixel values being provided to the ML neural network, we select subsets of the image. Here, to make the processes easier for our ML algorithm, we provide subsets of specific forms of the image data (i.e., Input-1 and Input-2). The range or selection of these image subsets is determined using an SOM network with some selected input. Here, we have iterated over a number of possible SOM network inputs and find that an SOM network with five source characteristics parameters allowed our deep model to distinguish between different image classes. Increasing the number of parameters does not improve the situation. Our process uses the source feature SOM to allow our model to be fed specific image subsets, ensuring that enough information is available to the model without overloading the input with repeated examples of similar information sets.

We found that for the MegaCam images being examined, a low exposure time value correlated strongly with poor image quality. We set all images with exposure times >30 s to 30 to group these longer exposures and allow the SOM to highlight the lower exposure times. Using exposure time is an example of using metadata as a feature of an SOM. Using some expert knowledge derived from a careful examination of the input data can be critical to the success of the ML classifier.

Once the model was trained, determining image quality was quite rapid. The time required for preparation of the image subsets to be provided to the trained model is driven by the access and network speeds connected to the storage nodes that house the MegaCam images. A parallel computing system can increase the speed of data preparation. However, for our particular use case, the image assessment speed is driven by access to the stored image set. In other words, when an image and the relevant input catalog, from SE, become available on disk for the combined model, the quality assessment requires less than 1 s of computing time. The number of clusters in an SOM is a tunable option; here, we have classified the SOM into 20 clusters. We could use a 30 cluster SOM, for example; however, that would require increased data access to train and use the model. We found the 20 cluster SOM adequate for the classification and rapid enough to be practical given the computing and storage facilities at hand.

6. The Comparison with Traditional Measurements

As mentioned in Section 1, traditional methods to assess the quality of images may use a variety of feature measurements, such as the mean PSF width and ellipticity of bright point sources of an image. To compare our result with conventional methods, we use 77,000 images (each with 36 CCDs) as a new test set. We use SE to measure the average FWHM and ellipticity for all bright point sources in an image. In Figure 11, we plot the distribution of the log of these average FWHM values against the average ellipticity for the 77,000 exposures. In the classical approach, one would select images with ellipticity ≲0.2 and log(FWHM) ≲ 0.2 as those of high quality, and we see from the figure that this is the region with the highest density of images in our ellipticity/FWHM space. This region can be considered as an area that selects out the highest quality images, i.e., those with no tracking issues and good observation conditions. We show below that our ML approach finds images that are in this high-quality region but are, in fact, poor quality data. First, we will present five plots to compare our algorithm with this classical method.

In Figure 12, we show five different plots similar to Figure 11, in which the points are color coded by the five different quality probabilities described in this paper. On the top panel, for example, we plot the value of the "good" probability versus ellipticity and log(FWHM). As can be seen, images with high "good" probability are concentrated in the same dense region shown in Figure 11. Generally, for binary classification, a natural decision boundary is 0.5 to separate two classes of a classification. Here we have a multi-class classification (five categories). Usually, the decision boundary for an image, for being good or bad, can be found after we get all the predictions (of the test sets) and after more visual inspections. We have found that a decision boundary of 0.20 for "good" probability is a proper boundary with minimum confusion (i.e., log P ≳ −0.5, in the color-coded bars). The area around the decision boundary is a little fuzzy and contains most of our misclassification cases. As can be seen in the top figure, there is a set of images for which the "good" probability is significantly below the good threshold but are still classified as high-quality images using the traditional approach.

As another example, the plot related to RBT, in Figure 12, shows that the images with significant bad tracking (high RBT probability, blue) are nearly absent from the high-quality region of the classical method. The high-probability RBT cases are related to the area that has high ellipticity and FWHM, but some low-probability RBT cases are also in this zone. These low-probability RBT measures turn out to select for images with poor focus. Thus, the RBT classification criterion provides additional diagnostics about the cause of the poor quality data.

The BT parameter has a high probability for images where the tracking is poor, but not catastrophic as is the case for high-probability RBT images. Looking at Figure 12, one can see that the "ridge" of images in the high-ellipticity large-FWHM section of the plot images transition from high RBT probability (very large FWHM) to high-BT probability with moderately large FWHM. There is a continuous connection between the RBT and BT probabilities. Using both parameters allows selecting between catastrophic and poor-image cases.

The plot B-Seeing has two populated areas. One area, related to high FWHM with low ellipticity, shows images that are poor due to poor seeing while the other is associated with high FWHM and high ellipticity and selects out those images where the focus was poor. Filtering out images with a high probability of B-Seeing removes both of these cases.

Finally, the plot of BGP probability shows that some images in the high-quality section of FWHM/ellipticity space have high BGP probability, i.e., P(BGP) > 0.8 (very dark points). These images exhibit problems with the sky background, for which ellipticity and FWHM are not diagnostic (see the top-left panel of Figure 13 for an image with P(BGP) > 0.8 but FWHM/ellipticity in the high-quality zone).

**Figure 13.** In the plot, we show six different examples of a test set with 77,000 exposures (each with 36 CCDs). The IDs denoted in the images with associated probabilities are described in Section 6. The images with IDs 1851894, 2120820, and 1110042 have acceptable ellipticity and FWHM; however, our models detect some problems with images such as BGP, and the combination of B-Seeing and BT. The image with ID = 1671968 is an example of an image with an out-of-focus character, which is detected by the method as B-Seeing. The two lower images have been classified as high-quality images, however, with small detected problems such as slight bad tracking for ID = 1943767 and a combination of small probabilities of BT, B-Seeing, and BGP for ID = 1324656.
Download figure:
Standard image High-resolution image

The reader is cautioned against examining these probability plots independent of each other. Each case is treated as exclusive; the sum of the probability of the five characteristics is 1. Thus, images for which one characteristic (such as BGP, for example) is nearly 1; the other probabilities are necessarily near 0. For example, we may have a problem in the background for images with high RBT probability as the prominent component, and thus, the BGP probability will be low (sources with high ellipticity and low BGP in the BGP plot are generally representative of this situation). A strong effect in one characteristic dilutes the impact of the others. Dominantly, for images taken under condition of RBT, the weather condition (which might create B-Seeing or BGP) is not significant to the quality assessment.

Figure 12 reveals some low-probability (i.e., red) points in the high-quality part of the ellipticity–FWHM space. That means some images do not have a good predicted quality using these classical parameters. Of the 70,000 images examined in this test, 7400 images (∼10%) can be found with low probabilities of being good images (i.e., P_Good < 0.2) but with ellipticity <0.2 and FWHM < 1.5 (classically considered good images). In other terms, the traditional ellipticity/FWHM approach misclassifies 10% of images.

Of the 7400 incorrectly classified images, about 4500 are associated with different problems in the background, based on the BGP probability, such as ID = 1851894 with BGP > 0.92 in Figure 13 and also ID = 731965 in Figure 10. Visual inspections of a random selection of these 4500 images reveal, universally, problems in the sky background and unusual sky patterns. These problems were not detectable using traditional methods. These background issues were not necessarily local fluctuations in the sky background but often low-level patterns in the image background. As another example, there are more than 650 of the 7400 images that exhibit very minor issues with tracking (such as ID = 2120820, BT > 0.80). More than 120 images in the high-quality zone but classified as poor images have a combination of bad seeing and bad tracking (such as ID = 1110042 with probabilities Good = 0.1183, RBT = 0.0790, BT = 0.2514, B-Seeing = 0.3970, BGP = 0.1542), which combine to remove them from the good image list. The examples show that the combined models can detect different issues that might be hidden for traditional methods. In the following, we show other interesting circumstances.

There are also many images of high quality that show minor traces of various "bad" characteristics. For example, there are more than 3,000 images with P_Good > 0.2, which show some bad tracking probability (such as ID = 1943767 with probabilities Good = 0.3438, RBT = 0.0213, BT = 0.4233, B-Seeing = 0.0158 BGP = 0.1958), but the "Good" characteristic is the dominant one, and these images are selected as good. Also, there are different examples of high-quality images with a combination of little bad seeing, bad tracking, and BGP (such as ID = 1324656 with probabilities Good = 0.2360, RBT = 0.0932, BT = 0.2358, B-Seeing = 0.2239, BGP = 0.2111). Visual inspection of these images reveals that they are, indeed, suitable for use in our image stacking system. The mentioned examples can be easily classified from databases for further investigations. In fact, one powerful aspect of this work is that the classifier can find combined problems and present them to users in terms of different probabilities. In other words, one can use a combination of probabilities to search for the desired image in terms of quality.

Many unique images are detected as bad seeing images; however, when they are inspected, they show that the images have out-of-focus characteristics. These images are mostly located in the high-FWHM and high-ellipticity region indicated in the plot B-Seeing (such as ID = 1671968, B-Seeing ∼1). The ability to distinguish between poor-focus images and images taken in poor seeing could be used as feedback into telescope operations if the model were deployed at the observatory.

Finally, there are also many images of very crowded stellar fields (with huge deblending influences), which may be flagged as BGP. These images are quite rare and indicate that the model can detect problems that may not have been considered in our training set.

7. Summary

We present a method in which two groups of input data are fed to a deep neural network to classify complex, ground-based telescopic images. As an example of a complex data set, we use CFHT MegaCam images to explore and demonstrate our approach. The first input contains the pixel information of the images, which we call representative images. They comprise a small set of cutout sources obtained from a list that, in turn, is provided by the SOM method. This method allows us to cluster detected sources in an image and pick suitable representative sources. We show that for astronomical images, we do not need to provide the entire (often large) image to a deep model, nor do we reduce the resolution of the image, which would remove useful information.

We have tested using different, independent sets of exposures and have found that a decision boundary of Good >0.20 provides an accuracy of more than 97%, where the 97% of images our algorithm classifies as "good" are also classified as such during a visual inspection. We suspect that cases of disagreement between the algorithm and the visual inspection step are driven by fatigue during the visual inspection step. In addition, about 10% of images classified as good using the traditional ellipticity/FWHM criterion are, in fact, not good images when classified with using the ML approach described here.

H.T. thanks Patrick Dowler for his support and help in using different services in CADC.

Assessment of Astronomical Images Using Combined Machine-learning Models

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Data