A publishing partnership

Articles

DETECTING ACTIVE GALACTIC NUCLEI USING MULTI-FILTER IMAGING DATA. II. INCORPORATING ARTIFICIAL NEURAL NETWORKS

and

Published 2013 September 5 © 2013. The American Astronomical Society. All rights reserved.
, , Citation X. Y. Dong and M. M. De Robertis 2013 AJ 146 87 DOI 10.1088/0004-6256/146/4/87

1538-3881/146/4/87

ABSTRACT

This is the second paper of the series Detecting Active Galactic Nuclei Using Multi-filter Imaging Data. In this paper we review shapelets, an image manipulation algorithm, which we employ to adjust the point-spread function (PSF) of galaxy images. This technique is used to ensure the image in each filter has the same and sharpest PSF, which is the preferred condition for detecting AGNs using multi-filter imaging data as we demonstrated in Paper I of this series. We apply shapelets on Canada–France–Hawaii Telescope Legacy Survey Wide Survey ugriz images. Photometric parameters such as effective radii, integrated fluxes within certain radii, and color gradients are measured on the shapelets-reconstructed images. These parameters are used by artificial neural networks (ANNs) which yield: photometric redshift with an rms of 0.026 and a regression R-value of 0.92; galaxy morphological types with an uncertainty less than 2 T types for z ⩽ 0.1; and identification of galaxies as AGNs with 70% confidence, star-forming/starburst (SF/SB) galaxies with 90% confidence, and passive galaxies with 70% confidence for z ⩽ 0.1. The incorporation of ANNs provides a more reliable technique for identifying AGN or SF/SB candidates, which could be very useful for large-scale multi-filter optical surveys that also include a modest set of spectroscopic data sufficient to train neural networks.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

It is generally accepted that active galactic nuclei (AGNs) are powered by the accretion of gas onto supermassive black holes (e.g., Rees 1984), which are considered common components in galactic centers. AGN activity can occur in a variety of host galaxies and at different luminosity levels, from high-redshift, high-luminosity quasars in giant elliptical galaxies to nearby low-luminosity AGNs hosted by early or intermediate spiral galaxies. The tight relation between the black hole (BH) mass and the luminosity of the spheroid of the galaxy (Tremaine et al. 2002) indicates strong connections between the formation and growth of the BH and the formation and evolution of the galaxy. There is considerable evidence that an AGN and a starburst (SB) can coexist. The best examples of this coexistence are the ultra-luminous infrared galaxies (Genzel et al. 1998; Schweitzer 2006). Less extreme examples are Seyfert galaxies (Boisson et al. 2000; González Delagdo et al. 2001; Sani et al. 2010) and low ionization nuclear emission line regions (Tommasin et al. 2010). Higher-redshift luminous AGNs are sometimes triggered by major mergers (Hopkins & Hernquist 2009), while lower-redshift AGNs are usually triggered by mechanisms such as the stochastic accretion of molecular clouds (Hopkins & Hernquist 2006), gas inflow in barred systems (Jogee 2006), and minor mergers (Younger et al. 2008). Studies of the AGN intensity and its host galaxy star formation rate (SFR) also indicate that higher-redshift luminous AGNs follow different evolutionary paths compared with lower-redshift and moderate-luminosity AGNs. For example, Rovilos et al. (2012) found that in high-luminosity AGNs (X-ray luminosity >1043 erg s−1) at z > 1, AGN intensity is correlated with the SFR of the host galaxies, yet there is no clear link between the two in lower-luminous AGNs or at lower redshifts (z < 1), although in some local Seyfert 1 galaxies, AGN intensity is correlated with nuclear SFR (Thompson et al. 2009; Diamond-Stanic & Rieke 2012). Our research interest in this paper lies in the lower-luminosity AGNs in the local universe (z ⩽ 0.1). The reasons for studying lower-luminosity nearby AGNs are many. The low-luminosity AGNs are the most common state of accretion in the current universe, as most AGNs spend their lives in the low state (Greene & Ho 2007). The high space density of low-luminosity AGNs provides a good sample for statistical analysis. As nearby systems with relatively higher spatial resolution, they offer the opportunity to identify the processes that trigger and fuel accretion, and also the structural properties of the host galaxies.

This paper is the second paper in a series. In Paper I (Dong & De Robertis 2013), we demonstrated that distinguishing AGNs from other types of galaxies using multi-filter imaging data and non-parametric aperture photometric analysis is optimized when each image has the same point-spread function (PSF), with narrower PSFs having better diagnostic utility. This technique can be easily adapted on other multi-wavelength image surveys. It can also be used to select AGN and star-forming (SF)/SB candidates in future spectroscopic surveys. In Section 2, following Paper I, we discuss shapelets, an imaging manipulation technique that decomposes galaxy images into basis functions. The shapelets coefficients can be used to reconstruct galaxy images with an optimized PSF (e.g., two-dimensional Gaussian), ensuring that each image has the same and the sharpest PSF.1 We will also discuss the limitations of shapelets and introduce a two-step shapelets routine we adapted to suit this study. In Section 3, we introduce our data set selected from the Canada–France–Hawaii Telescope Legacy Survey (CFHTLS) Wide Survey and the Sloan Digital Sky Survey Data Release 7 (SDSS DR7) main galaxy catalog. We also explain data reductions, which include using shapelets to adjust the PSF and applying the aperture photometric measurement on shapelets-reconstructed images. In Section 4, we discuss how to use an artificial neural network (ANN) to estimate galaxy photometric redshift, assign morphological T types, and identify AGNs. In the last two sections, we summarize our data products and discuss our results.

2. THE shapelets IMAGING TECHNIQUE

In Paper I, we demonstrated that a straightforward aperture photometric analysis requires that each filter have the same PSF. PSFs can be matched to the sharpest one via deconvolution or the broadest one via convolution. A sharper PSF is preferable as the best angular resolution is essential for detecting moderate-luminosity AGNs through color–color techniques, and estimating its intensity using aperture photometry. Deconvolution is straightforward as it can be performed by dividing the observed image and the PSF image in the Fourier domain, but even with optimal filtering, direct Fourier deconvolution amplifies the high-frequency noise, and is therefore not suitable for relatively low signal-to-noise ratio (S/N) regimes.

The imaging manipulation technique, shapelets, was first introduced to astronomical imaging analysis in the study of weak gravitational lensing, such as shear, rotation, and de-projection (Réfrégier 2003; Réfrégier & Bacon 2003). It decomposes an image into a series of localized basis functions with different shapes, hence its name. The basis function coefficients of shapelets can be used to calculate photometric properties such as galaxy flux and its ellipticity. These coefficients can also be used to reconstruct images. Because shapelets basis functions are orthonormal, shapelets deconvolution and convolution can be written in simple analytical forms. Therefore, the PSF of an image can be easily adjusted and sharpened by using an optimized PSF (for instance, a two-dimensional Gaussian function) via image reconstruction. More recently, shapelets has been used in galaxy image simulations (Massey et al. 2004) and galaxy morphological classifications (Wijesinghe et al. 2010).

2.1. shapelets Formalism

As described in Réfrégier (2003), the Cartesian shapelets basis functions are Gaussian-weighted Hermite polynomials. The one-dimensional dimensionless basis functions are defined as

Equation (1)

in which Hn is an nth order Hermite polynomial. The dimensional basis functions are defined as

Equation (2)

in which β is a characteristic size parameter that relates to the size of the object to be decomposed. The basis functions of shapelets are orthogonal, i.e.,

Equation (3)

The profile f(x) of an object of interest can be decomposed into a series of shapelets basis functions Bn, and expressed as

Equation (4)

where the shapelets coefficients are given by

Equation (5)

The two-dimensional shapelets basis functions are products of the one-dimensional basis functions. They are defined as

Equation (6)

where x is the position vector. An image can be decomposed into two-dimensional shapelets basis functions by

Equation (7)

where the shapelets coefficients are

Equation (8)

2.2. shapelets Convolution and Deconvolution

Any astronomical image is the raw image (intrinsic image) convolved with the PSF, which primarily reflects the diffraction limit of the telescope's primary mirror and atmospheric turbulence. A PSF of the image can be obtained by studying shapes and sizes of stellar profiles. If the PSF image (P) and the galaxy image (H) are similarly pixelized, the intrinsic image (F) of the galaxy is the deconvolution of the observed image and the PSF image, i.e., H = F*P. The deconvolution can be realized by the inverse of the PSF matrix, but the process can be very slow and numerically unstable. In the IDL package, SHAPELETS, Massey & Réfrégier (2005) suggested convolving the basis functions with the PSF matrix and comparing this matrix with the observed image using least squares fitting. Without the PSF matrix inverse, shapelets coefficients can be calculated by minimizing χ2 = (HFP)TV−1(HFP), where V is the noise map.2 The intrinsic image of the galaxy can be reconstructed using shapelets coefficients but without PSF convolution.

2.3. The Complications and Limitations of shapelets Image Reconstruction

In theory, an image can be reconstructed completely if the shapelets decomposition is completely sampled or oversampled, i.e., if the order of basis functions is infinity. In practice, only limited orders of basis functions can be used for decomposition; the shapelets decomposition is truncated at a maximum order of decomposition, nmax. In principle, the higher the nmax, the more faithful the reconstructed image is. In reality, higher nmax tends to pick up background noise around the object. nmax is also constrained by the degree of pixelization. This is because the sizes of some oscillations of the higher order basis functions are smaller than the pixel size, and therefore cannot be sampled properly.

The finite order of basis functions that can be used for decomposition and the discretization of the basis functions limit the ability of shapelets to reconstruct images. As mentioned in Bosch (2010), shapelets cannot properly reconstruct galaxies with high Sérsic indices because the shapelets basis functions are Gaussian weighted, which gives a profile softer than a galaxy profile (n = 1/2), especially for earlier type galaxies with n ∼ 4, and a sharper cut-off at larger radii than a Sérsic profile. A shapelets decomposition with a higher nmax can better represent a larger radius because the oscillations of higher orders have larger amplitudes at larger radii. However, such decomposition cannot represent the steeper profile of galaxy centers. There are also difficulties decomposing high-inclination or high-ellipticity objects. Because the traditional shapelets basis functions are circular, higher-order basis functions are needed to describe significantly elongated galaxies.

2.4. A Two-step shapelets Routine

Although shapelets is a powerful method for image reconstruction, it cannot reconstruct images faithfully under some conditions because of the finite order of the basis functions. Thus, there have been many efforts to improve the versatility of shapelets. For example, Bosch (2010) proposed a compound shapelets decomposition by combining two shapelets. Melchior et al. (2010) showed that elliptical basis functions are more suitable for decomposing highly elliptical objects. Most of their works focused on maintaining the uniqueness of the shapelets coefficients because they relied on these coefficients to calculate photometric parameters such as flux and ellipticity. Since we use shapelets to optimize the PSF, the coefficients will be used only for image reconstruction. To this end, we propose a two-step process based on the SHAPELETS IDL package. The photometric parameters of a galaxy are then be measured on the reconstructed, PSF-optimized image.

The details of the two-step process are as follows.

  • 1.  
    Apply shapelets decomposition with nmax = 20.3
  • 2.  
    Generate the galaxy light profile by measuring the flux within a series of radii from the center to the edge of the galaxy, flux(r).
  • 3.  
    Compare the light profiles between the observed galaxy (oflux) and the reconstructed galaxy (rflux) by calculating the relative flux difference Δflux = 1 − rflux/oflux.
  • 4.  
    If the difference is larger than 2%, conduct another shapelets decomposition on the residual image (observed image minus the reconstructed image).
  • 5.  
    Combine both reconstructed images and convolve with the optimized PSF (two-dimensional Gaussian with FWHM = 3 pixels).

We tested the two-step shapelets routine on galaxies randomly selected from the W4 field of the CFTLS (see below). In general, the relative flux difference between the observed and the reconstructed images is within 2%. Figure 1 shows the relative flux difference between the observed and the reconstructed images for the same galaxy. The blue line is the result after a single shapelets decomposition; the red line is the result after two shapelets decompositions. The gray area represents the first seeing disk. The two vertical red dash–dotted lines are R50 (radius contains 50% of the galaxy flux) and R80 (radius contains 80% of the galaxy flux), respectively. As mentioned before, shapelets might have problems dealing with high-Sérsic-index galaxy profiles. It is clear from Figure 1 that a single shapelets decomposition left a core in the galaxy center, which can be recovered by applying a second shapelets decomposition.

Figure 1.

Figure 1. Example of two-step shapelets decomposition. The x-axis is the galaxy's radius along the semi-major axis. The y-axis is the relative flux difference Δflux between the observed image and the reconstructed image. The gray area is the FWHM of the PSF. The two red dash–dotted lines are flux radii R50 and R80, which contain 50% and 80% of the galaxy flux, respectively. A single shapelets decomposition (blue line) left a core in the galaxy center. The two-step shapelets (red line) improves the faithfulness of the reconstructed image especially in the galactic center.

Standard image High-resolution image

A two-step shapelets can improve the shapelets decomposition of high-Sérsic-index galaxies. It is especially useful because we are particularly interested in AGNs that possess a steeper profile in the center. However, a two-step shapelets with circular basis functions cannot improve the decomposition of a high-inclination or high-ellipticity galaxy. As a result, we are only interested in lower-inclination objects (inclination <60°).

3. DATA

The primary data used for this study are from CFHTLS, a survey carried out with the CFHT 3.6 m telescope between 2003 and 2009. The detector, MegaCam, is a 40 CCD mosaic with each CCD having 2048 by 4612 pixels. The field of view is about 1 deg2 with a pixel scale of 0.187 arcsec pixel−1. The medium seeing of CFHT is 0.66 arcsec for i' and z', 0.77 arcsec for g' and r', and 1 arcsec for u*. The central wavelengths of CFHTLS filters are: u* (3740 Å), g' (4870 Å), r' (6250 Å), i' (7700 Å), and z' (9130 Å). The wide synoptic survey covers 170 deg2 in four patches. The r magnitude limit is +26 for a point source and +24.5 for a galaxy, with an S/N of 10. Data we used are from data release 5, T0005, which contains 19 out of 25 W4 fields and data release 6, T0006, which contains the remaining 6 fields of W4.4

The secondary data are from the SDSS DR7 main galaxy catalog. The SDSS provides both photometric information such as the Petrosian magnitude and effective radius, and spectroscopic information such as the redshift and spectral classification. The SDSS galaxies overlapped with CFHTLS are used as training samples to train ANNs for estimating photometric redshift, classifying galaxy morphological type, and identifying AGNs (see below).

3.1. Sample Selection

The CFHTLS data release includes the observed images, the SExtractor catalog for each image, the merged ugriz catalog, etc. Each SExtractor catalog contains 105 entries. We only selected objects with the SExtractor parameter FLAGS = 0 (FLAGS≠0 indicates possible image imperfection) in each filter. We also rejected objects with parameters ISOAREA_IMAGE = 0 (isophotoal area above the analysis threshold is zero), FLUX_AUTO <0 (flux within a Kron-like elliptical aperture is negative), or MU_MAX = 99 (peak surface brightness above the background is not measured). All remaining objects are cleanly measured. In the merged ugriz catalog they are either stars with FLAG = 1 or galaxies with FLAG = 0.

Our intention is to use shapelets to adjust the PSF, i.e., each image has the same and the sharpest PSF. Although shapelets can reconstruct a galaxy image faithfully (with a relative flux difference less than 2%), shapelets does have some limitations as discussed in Section 2. Therefore, we limited our objects to those with inclinations smaller than 60° in r', but the inclination constraint is not based solely on shapelets limitations; it is also because we want to avoid high internal extinction in galaxies. As mentioned in Section 2, the IDL package SHAPELETS optimizes the basis function coefficients by minimizing χ2. For a CFHTLS Wide survey u filter image, when u* > 22 mag, the algorithm that searches for the minimum χ2 becomes insensitive, even with S/N ∼ 5. Since the u filter image is essential for this study, and its magnitude is usually the faintest among ugriz magnitudes, we selected only objects with u* < 22 mag.

Following the discussion above, there are about 8300 galaxies within W4's 25 fields, referred to as "W4" throughout the rest of the paper. To enlarge the training sample, we add galaxies from CFHTLS W3. There are 49 fields released by T0005. They are selected the same way as W4, but only include galaxies that are also in the SDSS DR7 and are classified as an AGN or an SF/SB galaxy, and whose redshift range is between 0.0075 and 0.1. The upper redshift limit ensures that each galaxy image is well resolved (i.e., its 50% flux radius is at least at the FWHM of the image). As we discuss in later sections, this study relies on parameters that are measured mostly from the galaxy light profile, which becomes more difficult to measure with increasing distance.

3.2. Data Reduction

As discussed in Paper I, optical aperture photometric measurements require that the PSFs of all filters have the same FWHM, and the sharper the PSF the better. In order to optimize the PSF and ensure that all PSFs have the same FWHM, we applied a two-step shapelets decomposition on each object in W4 and W3. The two-step shapelets can reconstruct galaxies fairly well; in 60% of all galaxies, the relative flux difference is less than 2%, and 90% of all galaxies have a relative flux difference of less than 5%. The problematic decompositions are usually caused by image imperfections (for example, bad pixels) and close-by neighbors. We rejected galaxies whose flux differences are larger than 5%. The final W4 sample contains 8166 galaxies, and W3 sample contains 440 galaxies. Each galaxy has the same optimized PSF among each of the ugriz fitters.

After applying the two-step shapelets for each galaxy, we generated a light profile on the reconstructed galaxy by measuring fluxes within a series of radii, from the center to the edge of the galaxy. The interval between each radius is 1 pixel. Photometric parameters measured from each galaxy light profile are flux radii, concentration index, and color gradients.

4. ANALYSIS OF PHOTOMETRIC PARAMETERS USING THE ARTIFICIAL NEURAL NETWORK

The ANN is inspired by biological nervous systems. With an appropriate sample, a neural network can be trained by adjusting the values of connections (weights) between neurons, so that a particular input leads to a specific target output. A well-trained neural network can perform complex tasks such as pattern recognition, speech identification, and classification. ANN is often used in astronomy for galaxy morphological classification (Ball et al. 2004) and photometric redshift estimation (Collister & Lahav 2004; Firth et al. 2003). Because ANN does not need to presume an analytic function form, the photometric redshift (or morphological T type) can be more accurately computed using ANN's function approximation ability than employing a more traditional χ2 minimization. In astronomy, classification is often carried out by applying cuts to various distributions that distinguish the target objects from the background objects. This approach cannot combine the correlations between different parameters or the distribution of each parameter, seriously reducing the classification efficiency. Statistical methods based on multi-parameter analysis are more suitable for this type of problem. ANN is a powerful tool for classification especially when applied to nonlinear relations in a multi-parameter space. As a result, we have chosen to adopt ANN for pattern recognition in order to identify galaxies as either AGNs, SF/SB galaxies, or passive galaxies (early-type, bulge-dominated, but non-active galaxies).

The selection of the training sample is vitally important for the ANN. The more relevant the input parameters are to the ANN output, the more reliable the ANN output will be. The training sample also needs to cover the range of inputs (i.e., the entire parameter space) on which the network will be used because the ANN cannot extrapolate beyond the range over which it was trained. Although the distribution of each input parameter is not important, each parameter needs to be sufficiently sampled over the entire parameter space. We will discuss specific requirements for each ANN in later sections.

For a systematic discussion about ANN, we recommend Neural Networks for Pattern Recognition (Bishop 1995). The creation and training of the neural nets were realized by the MATLAB Neural Network Toolbox.5

4.1. Photometric Redshift

In order to calculate a galaxy's physical size and its luminosity, a distance estimate is required. Since CFHTLS is a purely imaging survey, distance estimates must rely on photometric redshifts. The template-matching technique estimates photometric redshifts that rely on measuring galaxy colors in three or more filters, which give a rough approximation of the spectral energy distribution (SED). This method requires a set of SED templates covering a range of galaxy types, luminosities, and redshifts that represent the populations of the sample to be studied. The redshift of a target galaxy is chosen by minimizing the χ2 between the target galaxy and the template galaxy. The accuracy of the photometric redshift depends on the template set and how closely it represents the population of the target sample. Another method, the empirical method, is more practical for a large data sample with both photometric parameters and spectroscopic redshift. It involves fitting the redshifts as a function of the photometric parameters such as colors. The coefficients of the function are estimated by minimizing the χ2 between the predicted and measured redshifts. The redshift of a galaxy without a spectrum can be estimated by applying the function to its photometric parameters.

In our analysis, ANN training replaces the traditional function fitting. The photometric parameters of the training sample, such as color and concentration index, are used as inputs to the network. The network output is the photometric redshift. Instead of adjusting a function's coefficients to minimize the χ2 between the photometric redshift and the spectroscopic redshift, ANN adjusts the weights between each neuron to minimize the cost function between the photometric redshift and the spectroscopic redshift. The cost function is defined as

Equation (9)

where tk is the spectroscopic redshift, ok is the photometric redshift, and N is the number of galaxies in the training sample.

4.1.1. The Training Sample

In order to train the neural net, a training sample has to be selected that would optimally span the parameter space of the data set to which the net will be applied. The CFHTLS photometric catalog (Coupon et al. 2009) provides a valid photometric estimate of 5450 out of 8166 objects in the W4. Of these, 96% of galaxies have redshifts <0.4. The training sample used was selected by cross-identifying the W4 sample and the main galaxy catalog of SDSS DR7, referred to hereafter as W4/S7; the remaining W4 galaxies are referred to as $\overline{{\rm W4}/{\rm S7}}$. W4/S7 contains 1428 galaxies, with r ⩽ 17.7 mag, and z < 0.6. The redshift range of the training sample overlaps well with the redshift range of the W4.

4.1.2. Training the Network

The training sample is randomly divided into "training," "validation," and "test" sets following the proportion 70%, 15%, and 15%, respectively. The network is trained by minimizing the cost function of the training set. The cost function of the validation set is also calculated during the training to prevent over-fitting. The test set is not used during the training, but can be used for comparison among different nets. The MATLAB Neural Network Toolbox uses the Levenberg–Marquardt algorithm to optimize the cost function, which is the fastest method for function approximation. The training stops when either of the following conditions occur.

  • 1.  
    The maximum number of iterations is reached.
  • 2.  
    The maximum amount of time that is allowed for the iterations to continue is exceeded.
  • 3.  
    The cost function reaches the pre-set minimum. (The default value is 0, which means there is no error.)
  • 4.  
    The cost function gradient falls below the specified minimum gradient.
  • 5.  
    The cost function of the validation set has increased more than a specified maximum number of times since the last time it decreased.

The input to the network directly affects the network's performance. The more relevant the parameter, the more accurate the network result will be. It is natural to use galaxy colors as parameters, as they provide a rough approximation of a galaxy's SED. Ball et al. (2004) showed that additional parameters such as the 50% flux radius can improve a network's performance. Our experiments showed similar results. First we trained the network using only the galaxy colors (i.e., ug, gr, ri, iz). The regression R-value was 0.87, and the rms was 0.035. When we added parameters that were measured from the shapelets-reconstructed images, such as flux radii R05 (5% flux radius), R20 (20% flux radius), and R50 (50% flux radius), the result improved to R = 0.92 and rms = 0.026.

The default network architecture for the function approximation in MATLAB has one hidden layer with 20 neurons and one outer layer with one neuron, usually denoted as 20:1. In theory, a more complicated structure (i.e., increasing layers or neurons) can improve the network's performance, yet the number of neurons or layers cannot be added without limit. This is because the random errors from the input measurements will set a fundamental limit to the rms; the network will be meaningless if the number of weights is larger than the degrees of freedom. In addition, more weights also take a longer time to train. Therefore, in practice, it is always profitable to use the simplest network possible while maintaining optimal results. During our training, we found that a slightly more complicated network that contains two hidden layers with 10 neurons on each layer gives better results. We denoted it 10:10:1.

In order to avoid local minima of the cost function, the net was trained 10 times. Each net was applied to W4. The photometric redshift was given by the median value. We carried out a similar procedure for all ANN trainings.

4.1.3. Results

The neural nets used for photometric redshift estimate are summarized in Table 1. The regression plot of the spectroscopic redshift versus the ANN estimated photometric redshift of the training set is shown in Figure 2. The photometric redshift is the median value among the 10 trained nets. The regression R-value is 0.92, and the rms is 0.026. Ball et al. (2004) estimated redshifts (up to 0.4) with an rms of 0.020 using ∼105 objects from SDSS DR1. Collister & Lahav (2004) (redshift up to 0.7) received an rms of 0.023 using 30,000 objects from SDSS EDR. Our estimation, using only 1428 objects, has an rms of 0.026. In principle, the larger the training sample, the better the net will perform. Both Ball et al. (2004) and Collister & Lahav (2004) used samples 10 times larger than our sample, achieving only a slightly better rms. The only different input parameters we used compared with Ball et al. (2004) were parameters measured from the shapelets-reconstructed image. This shows that the PSF optimization improves the accuracy of the estimation. It might also indicate that 0.02 is close to the accuracy limit of the photometric redshift estimation using an ANN. We applied the trained nets on the $\overline{{\rm W4}/{\rm S7}}$ sample to obtain the photometric redshift.

Figure 2.

Figure 2. Regression plot of the spectroscopic redshift vs. the estimated photometric redshift. The x-axis is the spectroscopic redshift. The y-axis is the median value of the ten-times trained net. The red dash–dotted line is where the estimated photometric redshifts equal the spectroscopic redshifts.

Standard image High-resolution image

Table 1. ANN for the Photometric Redshift

Architecture 10:10:1
Input parameters Galaxy colors: ug, gr, ri, iz
  Flux radius: R05, R20, R50 in r measured from the SHAPELETS-reconstructed image
  Mean surface brightness within R50 in r from the SHAPELETS-reconstructed image
Output The photometric redshift
Training sample W4/S7, consisting of 1428 objects
Training algorithm Levenberg–Marquardt
Result R-value:0.92, rms:0.026

Download table as:  ASCIITypeset image

4.2. Assigning the Galaxy Morphological T Type

Galaxy morphological classification is traditionally done visually by trained observers. With large surveys, automated classification methods that rely on empirical relations between the galaxy morphology and a variety of its photometric and spectroscopic parameters have become more popular. The most often used parameters include concentration index, Sérsic index, colors, color gradients, bulge-to-disk ratio, effective radius, and disk scale length. Although automated methods are more efficient, visual classification still gives more accurate results. One of the latest visual classifications for a large sample of galaxies was performed by Nair & Abraham (2010). Nair & Abraham's catalog provides a dataset including morphological T types for over 14,000 galaxies. With their results, we built and trained the neural nets to estimate T type based on photometric parameters.

4.2.1. A Secondary Morphological Classification

Nair & Abraham (2010) selected galaxies from the SDSS DR4 spectroscopic main sample (Strauss et al. 2002). The sample is limited to extinction-corrected g magnitudes brighter than 16 and redshift range 0.01–0.1. Table 2 shows Nair's classification scheme. Nair's classification is consistent with the Third Reference Catalog of Bright Galaxies (de Vaucouleur et al. 1991), RC3. The mean deviation between these two classifications is about 1.2 T types, with Nair's classification slightly later. Nair's classification is fairly consistent with that of Fukugita et al. (2007). The mean deviation between the two classifications is about 0.8 Fukugita's class bins, with Nair's classification slightly earlier. There are 14,034 galaxies in Nair's sample, but only 20 galaxies appear in either the W4 or W3 samples, making it too small a sample to train a neural net. Therefore, we developed a secondary classification. First we used Nair's sample to train a neural net, MorphNet1, and applied it to samples W4/S7 and W3. Then we used samples W4/S7 and W3 to train another neural net, MorphNet2, and applied it to the remaining samples W4 and $\overline{{\rm W4}/{\rm S7}}$.

Table 2. Nair & Abraham's (2010) Classification Scheme

T Class
−5 c0, E0, E+
−3 S0-
−2 S0, S0+
0 S0/a
1 Sa
2 Sab
3 Sb
4 Sbc
5 Sc
6 Scd
7 Sd
8 Sdm
9 Sm
10 Im
99 ?

Notes. "?" and "T = 99" are of an unknown class (see Nair & Abraham 2010 for details). Our classification only used galaxies from T = −5 to T = 6. It has no effect on our classification.

Download table as:  ASCIITypeset image

Before using Nair's sample, we removed galaxies whose classification is doubtful, highly uncertain, or "peculiar," or those with features such as shells or tidal tails. We also removed galaxies that are in pairs or flagged "interaction" and galaxies whose inclinations are >60°. The number of remaining galaxies was 8013. The input parameters used for MorphNet1 training were the following.

  • 1.  
    Petrosian magnitudes in the u, g, r, i, z bands.
  • 2.  
    Galaxy magnitudes in the u, g, r, i, z bands.
  • 3.  
    Petrosian radius in r; radii of 50% and 90% Petrosian flux in r.
  • 4.  
    Effective radii for de Vaucouleurs and exponential profiles in the r band.
  • 5.  
    Concentration index in the r band.
  • 6.  
    Mean surface brightness within 50% Petrosian flux radius in r.
  • 7.  
    Spectroscopic redshift.

The architecture of the neural net contains three hidden layers with 10 neurons on each layer, denoted as 10:10:10:1. We followed the training procedures described in the previous section and Figure 3 shows the regression plot. The x-axis is the T type given by Nair & Abraham (2010). The y-axis is the mean value and the standard deviation of each T type estimated by the ANN. The regression R-value is 0.93, and the rms is 1.1 Nair's T parameter. MorphNet1 was then applied to the W4/S7 and W3 samples.

Figure 3.

Figure 3. Regression plot of the T type. The x-axis is the T type given by Nair & Abraham (2010). The y-axis is the mean value and the standard deviation of each T type estimated by MorphNet1. The red dash–dotted line is where the estimated T types equal Nair & Abraham's T types. The regression R-value is 0.93, and the rms is 1.1 in Nair's T parameter. The quality of the automated classification is comparable to the visual classification.

Standard image High-resolution image

W4/S7 was used as the training sample for MorphNet2. The input parameters were very similar to those for MorphNet1 but with some replacements. Radii of 50% and 90% Petrosian flux were replaced by radii of 50% and 80% galaxy flux, as given by the CFHTLS SExtractor catalogs. Effective radii for de Vaucouleurs and exponential profiles were replaced by flux radii measured from shapelets-reconstructed images. The redshift was also removed from the input because the $\overline{{\rm W4}/{\rm S7}}$ sample lacks spectroscopic redshifts. The architecture was 10:10:10:1. The regression plot is shown in Figure 4. The x-axis is the MorphNet1 T type and the y-axis is the media value of the MorphNet2 estimated T type. While training MorphNet2, we also renumbered Nair's T type (from −5 to 6) with 1–10 digit numbers. The R-value is 0.91, and the rms is 1.2 in Nair's T parameter.

Figure 4.

Figure 4. Regression plot of the T type. The x-axis is the T type estimated by MorphNet1. The y-axis is the median value of the T type estimated by MorphNet2. The black dots represent each object in the training sample. The blue bars are means and standard deviations of each T type. The red dash–dotted line is where the MorphNet2 estimated T types equal the MorphNet1 T types. The regression R-value is 0.91, and the rms is 1.2 in Nair's T parameter.

Standard image High-resolution image

The formal rms of the estimated T type is about 1.6 in Nair's T parameter considering both MorphNet1 and MorphNet2, just less than 2 T types. Using an ANN, Ball et al. (2004) predicted a galaxy classification with 1.5 T types. Lahav et al. (1996) used ANN on ESO-LV galaxies with an rms around 1.8 T types. Considering our two-step method, uncertainties are not unreasonable. We applied MorphNet2 on sample $\overline{{\rm W4}/{\rm S7}}$ to assign T types.

4.3. Galaxy Spectral Classification: Selecting AGNs Using an ANN

AGNs can be detected or confirmed via spectroscopic surveys by measuring emission-line flux ratios. AGNs can also be detected via image analysis, which usually involves decomposing either the two-dimensional galaxy image or the one-dimensional galaxy light profile to a bulge, disk, and a possible central component (point source). The identity of the point source is determined by its colors. In Paper I, we showed that, in principle, AGNs, SF/SBs, and passive galaxies may be distinguished in a color–color diagram. We also demonstrated how an AGN can be separated from a passive galaxy by aperture photometric analysis. AGNs can also be distinguished from passive galaxies using parameters that reflect the concentration of light near the nucleus (e.g., the inverse concentration index versus color or the color gradient versus color) as showed in Choi et al. (2009). All these analyses were carried out on two-dimensional parameter planes, yet an analysis in a three-dimensional or multi-dimensional space should yield a clearer separation for each class.

In this study, we employ a non-parametric aperture photometric analysis using parameters such as flux radii, colors within each flux radius, and color gradients as input. An ANN for pattern recognition is used to test whether AGNs, passive galaxies, and SF/SB galaxies can be distinguished from one another.

4.3.1. Training Sample Selection

The training sample is selected by cross-identifying the MPA/JHU value-added galaxy catalog (Tremonti et al. 2004) DR7 and sample W4. There are 1420 galaxies, referred to hereafter as W4/M7; the remaining galaxies in W4 are referred to as $\overline{{\rm W4}/{\rm M7}}$. Following SDSS's spectroscopic classification criteria,6 an object is:

  • 1.  
    An AGN, if
    Equation (10)
  • 2.  
    An SF galaxy, if
    Equation (11)
  • 3.  
    An SB galaxy, if (11) is satisfied and the EW(Hα) >500 Å.

Of 1420 objects, MPA/JHU lists 74 AGNs, 278 SBs, and 443 SFs. The remaining 625 objects are classified only as "GALAXY." We expect that some of these are passive galaxies. Generally, passive galaxies do not have significant nuclear emission lines, such as [O ii] λ3727 Å, [O iii] λ5007 Å, Hα, and Hβ, which are contributed either by AGN activity or star formation. Balogh et al. (2004) show that the EW of Hα is close to 0 for galaxies without ongoing star formation; EW(Hα) varies from 4 Å to >40 Å for ongoing star formation. Therefore, we consider an object a passive galaxy when its emission-line flux ratio satisfies (11), and EW(Hα) <3 Å. Out of 625 galaxies, 297 can be considered passive galaxies. Among these, only 5% show EW([O iii] 5007 Å) >1.5 Å, and 3% show EW([O ii] 3727 Å) >1.5 Å. Thirty-two galaxies that satisfy (10) are added to the AGN group. For the remaining 296 objects, we drew the emission-line flux ratio of [N ii] to Hα versus EW(Hα) in Figure 5. It is clear that these galaxies (black circles) are closer to SF galaxies (blue dots) than to passive galaxies (red dots). Since we already have sufficient SF galaxies in the training sample, we removed these galaxies from consideration. We also added galaxies from sample W3, which included 220 AGNs and 140 passive galaxies. The remaining galaxies in sample W3 are SF and SB galaxies, which are no longer used. The final training sample contains 326 AGNs, 437 passive galaxies, 278 SB galaxies, and 433 SF galaxies.

Figure 5.

Figure 5. Comparison of star-forming and passive galaxies. The x-axis is the logarithm of the flux ratio of [N ii] λ6584 and Hα emission lines. The y-axis is the equivalent width of Hα. It is clear that the remaining 296 galaxies (black circles) are closer to star-forming galaxies (blue dots) than to passive galaxies (red dots).

Standard image High-resolution image

4.3.2. The Network Training

As mentioned earlier, the galaxy color, color gradient, and light profile of a galaxy hosting an AGN are different compared with a passive galaxy or an SF/SB galaxy. Therefore, we selected the following photometric parameters as input.

  • 1.  
    Galaxy magnitudes in the u, g, r, i, z bands.
  • 2.  
    Galaxy colors: ug, gr, ri, and iz.
  • 3.  
    Radii of 50% and 80% flux in r, i.e., R50 and R80.
  • 4.  
    5%, 20%, and 50% flux radii in r measured from shapelets-reconstructed images.
  • 5.  
    Color gradients Δ(ur) and Δ(gr) calculated from shapelets results.
  • 6.  
    Concentration index in the r band, i.e., the ratio of R50 and R80.
  • 7.  
    Mean surface brightness within a 50% flux radius in r.

The architecture of the network is 15:15:1. The output is identified as AGN, SF, SB, or passive galaxy. We followed the same procedures used to train the other networks. Our first attempt was not successful, as overall only 60% of objects were accurately classified. Our second attempt combined SF and SB into a single class since the only difference between SFs and SBs is the EW of Hα emission. The neural net, SpecNet1, accurately classified objects to 67%, 63%, and 94%, respectively, for AGNs, passive galaxies, and SF/SB galaxies. An SF/SB galaxy can be clearly distinguished from AGNs or passive galaxies as only 6% of galaxies are misclassified as SF/SB. Distinguishing AGNs from passive galaxies needs further investigation. It is possible that the contamination from the host galaxy reduces the color difference between the AGN galaxy and the passive galaxy but with high spatial resolution imaging data, a point source may still be detected via details contained in the galaxy's light profile. Therefore, we constructed another neural net, SpecNet2, to separate AGNs from passive galaxies by only including input parameters that are most relevant to the galaxy light profile, such as concentration index, flux radius measured from the shapelets-reconstructed image, and mean surface brightness within 5%, 20%, and 50% flux radii. The training sample only includes AGNs and passive galaxies from the W4/M7 sample. The AGN identification improved from 67% accuracy to 85%; the successful identification of passive galaxies remains nearly the same, improving only to 67% accuracy. We applied SpecNet1 to sample $\overline{{\rm W4}/{\rm M7}}$, then applied SpecNet2 to galaxies classified by SpecNet1 as AGNs or passive galaxies. By combining SpecNet1 and SpecNet2 we can identify 70% of AGNs and passive galaxies and 90% of SF/SB galaxies with confidence.

5. DATA PRODUCTS AND DISCUSSION

The first part of the data products are coefficients of the shapelets basis functions and the maximum value of order nmax, which can be used for image reconstruction. The second part of the data products are galaxy parameters. For the W4/S7 sample, the spectroscopic redshift is provided by SDSS DR7, and the morphological T type is given by Nair & Abraham (2010) or neural net MorphNet1. For sample $\overline{{\rm W4}/{\rm S7}}$, the photometric redshift is estimated by the ANN, and its morphological T type is given by neural net MorphNet2. Sample W4/M7 spectral classification is provided by MPA/JHU DR7. Sample $\overline{{\rm W4}/{\rm M7}}$ spectral classification is given by either SpecNet1 or SpecNet2. The K-correction is calculated using an empirical relation between the K-correction, the redshift, and the color using an algorithm by Chilingarian et al. (2010). The absolute magnitude is corrected for Galactic extinction and the K-correction. Table 3 is an example of part of the final catalog of results. As mentioned earlier, in order to guarantee a relatively well resolved galaxy image, we only selected galaxies from the W3 sample with z ⩽ 0.1. Since analysis using an ANN is done using parameters sensitive to the spatial resolution, we limit most of our discussion in this section to galaxies with redshifts <0.1, and denote subsamples including only z ⩽ 0.1 galaxies with a subscript "0.1." These are samples (W4)0.1, (W4/S7)0.1, $(\overline{{\rm W4}/{\rm S7}})_{0.1}$, (W4/M7)0.1, and $(\overline{{\rm W4}/{\rm M7}})_{0.1}$.

Table 3. The Catalog of Results

Field Id R.A. (2000.0) Decl. (2000.0) z z_code T T_code SpecClass SpecClass_code Mu Mg Mr Mi Mz
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15)
W4 + 0 + 0 148 333.5401001 0.8257565 0.043 1 5 2 SF/SB 1 −18.0 −18.8 −19.2 −19.4 −19.5
W4 + 0 − 1 35527 333.0122986 0.0539394 0.057 1 1 2 AGN 4 −19.3 −20.8 −21.5 −21.9 −22.2
W4 + 2 + 0 163040 335.4468079 1.5384290 0.065 2 2 2 AGN 4 −20.4 −21.6 −22.1 −22.5 −22.7
W4 + 2 − 1 131733 335.7080994 0.5651003 0.086 2 −5 2 passive 4 −17.0 −18.4 −19.2 −19.6 −19.8
W4 − 3 + 1 223939 330.1088867 2.6589079 0.094 2 1 2 SF/SB 3 −17.1 −18.0 −18.3 −18.5 −18.5
 ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅  ⋅⋅⋅

Notes. (1) CFHTLS field name. (2) Object ID in each CFHTLS field. (3) and (4) R.A. and decl. (5) and (6) Redshift. z_code=1: spectroscopic redshift from SDSS DR7; z_code=2: photometric redshift estimated by ANN. (7) and (8) Morphological T type. T_code=0: T type is from Nair & Abraham (2010); T_code=1: estimated by neural net MorphNet1; T_code=2: estimated by neural net MorphNet2. (9) and (10) Galaxy spectral class. s_code=1: classified by MPA/JHU DR7; s_code=2: classified by MPA/JHU DR7 as "GALAXY," reclassified by emission-line ratios; s_code=3: classified by neural net SpecNet1 as starburst or star-forming galaxies; s_code=4: classified by neural net SpecNet1 and SpecNet2 as AGN or passive galaxies. (11–15) Absolute magnitudes in u, g, r, i, z, after K-correction and Galactic extinction correction. In units of magnitude.

Only a portion of this table is shown here to demonstrate its form and content. Machine-readable and Virtual Observatory (VO) versions of the full table are available.

Download table as:  Machine-readable (MRT)Virtual Observatory (VOT)Typeset image

5.1. Redshift

Sample W4 is separated into two subsamples: W4/S7 which includes SDSS spectroscopic redshifts, and $\overline{{\rm W4}/{\rm S7}}$ which only has ANN-estimated photometric redshifts with an rms of 0.026. Although the W4/S7 and $\overline{{\rm W4}/{\rm S7}}$ subsamples are taken from the same W4 sample, they are not necessarily from the same underlying distribution. The two-sample Kolmogorov–Smirnov test (K-S2 test) indeed rejects the null hypothesis that the two subsamples' redshifts share a similar distribution at a level of probability of less than 5%.

We compare the normalized redshift distributions of the W4/S7 sample (the red bars) and $\overline{{\rm W4}/{\rm S7}}$ (the blue bars) in Figure 6. The bin size is 0.025. The number within the parentheses is the size of each sample. The gray background is the normalized redshift distribution of the entire W4 sample, which mostly consists of photometric redshifts. There are two "spikes," one around z = 0.05 in sample W4/S7 and another around z = 0.12 in $\overline{{\rm W4}/{\rm S7}}$. These significant differences between subsamples W4/S7 and $\overline{{\rm W4}/{\rm S7}}$ might indicate a possible systematic error.

Figure 6.

Figure 6. Normalized distributions of the spectroscopic redshift of the W4/S7 sample (the red bars) and the ANN-estimated photometric redshift for $\overline{{\rm W4}/{\rm S7}}$ (the blue bars). The number within parentheses is the size of each sample. The gray background is the normalized redshift distribution of the entire sample W4.

Standard image High-resolution image

To rule out the possibility that some galaxies with z = 0.05 were misidentified as z = 0.12, we took a closer look at the W4/S7 sample. We selected galaxies whose spectroscopic redshifts are in the z ∼ 0.05 bin and plot the distribution of their ANN redshifts in Figure 7(a). It is clear that z ∼ 0.05 galaxies are sometimes misidentified as z ∼ 0.075 galaxies, but the few galaxies that are misidentified in the z ∼ 0.12 bin cannot contribute to the z ∼ 0.12 "spike," In order to gain some insight about what might contribute to the z ∼ 0.12 "spike," we selected another group of galaxies from the W4/S7 sample, whose ANN redshift is within the z ∼ 0.12 bin. We plot the spectroscopic redshift distribution of these galaxies in Figure 7(b). It is clear that galaxies with larger redshifts can be misidentified as z ∼ 0.12. This might be because some parameters (such as flux radii) used to train the neural net rely on the angular resolution, and their usefulness will be diminished as the distance increases. Since we are only interested in galaxies within z ⩽ 0.1, this will not affect our analysis.

Figure 7.

Figure 7. Detailed study of the W4/S7 sample. The x-axis is the redshift. The y-axis is the number within each bin normalized by the size of the W4/S7 sample. Panel (a) shows how an ANN z ∼ 0.05 galaxy may be misidentified. It shows that z ∼ 0.05 galaxies are sometimes misidentified in the z ∼ 0.075 bin, but its contribution to other bins is negligible. Panel (b) shows which spectroscopic redshift bins contribute to the ANN redshift bin z ∼ 0.12. Most of the misidentified galaxies are from larger redshifts.

Standard image High-resolution image

The "spike" or the lack of z ∼ 0.05 galaxies in the $\overline{{\rm W4}/{\rm S7}}$ sample appears to be real. Blanton et al. (2003) calculated that the characteristic i magnitude of the SDSS main galaxy sample is −21 mag. If a galaxy's redshift is 0.05, its apparent magnitude is about +16 mag. In sample (W4/S7)0.1, there are 86 out of 289 galaxies whose i magnitude is less than +16. In the $(\overline{{\rm W4}/{\rm S7}})_{0.1 }$ sample, there are only 74 out 1388 galaxies whose i magnitude is less than +16. It seems that the "spike" may be explained by a relatively larger fraction of brighter galaxies in the (W4/S7)0.1 sample than in $(\overline{{\rm W4}/{\rm S7}})_{0.1}$.

5.2. Galaxy Luminosity

Galaxy apparent magnitudes are adopted from the CFHTLS. The absolute magnitudes are calculated and corrected for Galactic extinction and the K-correction. The mean ugriz magnitudes and the standard deviations are calculated for the W4/S7 and $\overline{{\rm W4}/{\rm S7 }}$ samples. The results are shown in Table 4. The mean absolute magnitude of galaxies in sample (W4/S7)0.1 is about 0.5 mag brighter than sample $(\overline{{\rm W4}/{\rm S7}})_{0.1}$. Not unexpectedly, because of the superior image quality and longer exposure times, CFHTLS can probe deeper into the galaxy luminosity function than SDSS.

Table 4. Mean ugriz Absolute Magnitudes and the Standard Deviations of W4

Sample Mu Mg Mr Mi Mz
(W4/S7)0.1 −18.4 ± 1.1 −19.5 ± 1.2 −19.9 ± 1.3 −20.3 ± 1.4 −20.4 ± 1.5
W4/S7 −19.2 ± 1.2 −20.3 ± 1.3 −20.7 ± 1.5 −21.1 ± 1.4 −21.3 ± 1.4
$(\overline{{\rm W4}/{\rm S7}})_{0.1}$ −17.2 ± 1.4 −18.2 ± 2.7 −18.2 ± 2.2 −17.8 ± 1.9 −19.1 ± 1.8
$\overline{{\rm W4}/{\rm S7}}$ −18.1 ± 1.3 −19.2 ± 1.9 −19.5 ± 2.2 −19.5 ± 1.9 −20.2 ± 1.6

Download table as:  ASCIITypeset image

5.3. Galaxy Morphology

The galaxy morphological classification is estimated using the MorphNet1 and MorphNet2 ANNs based on Nair & Abraham's classification. Although we applied the neural nets to the entire W4 sample, only galaxies with z ⩽ 0.1 should be considered meaningful because Nair & Abraham's sample is limited to z ⩽ 0.1. Figure 8 shows the normalized T type distribution of sample (W4)0.1 (1677 galaxies) shown as gray bars; regarding its two subsamples, (W4/S7)0.1 (in red) includes 289 galaxies and $(\overline{{\rm W4}/{\rm S7}})_{0.1}$ (in blue) includes 1388 galaxies. The bottom histogram is the T type distribution of the training sample selected from Nair & Abraham (2010), which includes about 8000 galaxies. There is a large fraction of galaxies with T = −5 in Nair's sample, which is unusual as elliptical galaxies typically constitute fewer than 10% of field galaxies (Mahtessian 2011). This is likely in part because Nair & Abraham (2010) classified elliptical galaxies as T = −5, including those that are generally classified as T = −6 and −4, and those only classified as E but without an explicit subclass. It also might explain the extra T = −3 galaxies in the (W4/S7)0.1 sample, which includes some misidentified galaxies from T = −5. The T type distributions of (W4/S7)0.1 and $(\overline{{\rm W4}/{\rm S7}})_{0.1}$ are very similar except at T = −3. The KS2-test indicates that they might have the same underlying distributions with p > 0.05. Overall, within z ⩽ 0.1, and u apparent magnitude ⩽ + 22, the galaxy morphology is mostly evenly distributed from lenticulars (T = −2) to late-type spirals (T = 5).

Figure 8.

Figure 8. Morphological types of sample (W4)0.1 (top) and the training sample from Nair & Abraham (2010) (bottom). The y-axis is normalized by the size of each sample. The number within the parentheses is the size of each sample.

Standard image High-resolution image

5.4. AGN Galaxies

We classified W4 galaxies as AGNs, SF/SBs, and passive galaxies using neural nets SpecNet1 and SpecNet2 based on the classifications from MPA/JHU DR7. As identifying AGNs relies largely on parameters deduced from the galaxy's light profile, our interpretations are also limited to galaxies with z ⩽ 0.1. The (W4)0.1 sample includes 1677 galaxies; among these there are 186 AGNs, 186 passive galaxies, and 1305 SF/SB galaxies. Table 5 summaries the spectral classification of the (W4)0.1 sample.

Table 5. Spectral Classifications for the (W4)0.1 Sample

Type (W4/M7)0.1 $(\overline{{\rm W4}/{\rm M7}})_{0.1}$ Total
AGN 11% 11% 11%
Passive 19% 8% 11%
SF/SB 70% 81% 78%

Notes. In the (W4/M7)0.1 sample AGNs and SF/SB galaxies are classified by MPA/JHU; passive galaxies are constrained by emission-line ratios and equivalent widths. Galaxies in the $(\overline{{\rm W4}/{\rm M7}})_{0.1}$ sample are classified by neural nets SpecNet1 and SpecNet2.

Download table as:  ASCIITypeset image

We compared the absolute magnitudes of AGNs, SF/SBs, and passive galaxies in Table 6. The mean magnitudes of AGNs and passive galaxies are very similar, with AGNs slightly brighter. SF/SB galaxies are always ∼1–2 mag fainter. The reason for this is that SF/SB galaxies are more likely to be later-type galaxies, which are generally fainter than early-type galaxies. This is clearly demonstrated in Figure 9 where the T types of AGN, SF/SB, and passive galaxies are compared. AGNs are typically lenticulars (T = −3, −2) or early-type spirals (T = 0–3). SF/SB galaxies are generally later-type galaxies (T > 3). The passive galaxies also tend to be earlier-type galaxies as indicated by their definition: early-type, bulge-dominated, and non-active.

Figure 9.

Figure 9. Normalized T type distribution of the (W4)0.1 sample. The figure in parentheses is the number of galaxies classified as each type. The gray background is the T type distribution of the (W4)0.1 sample. AGNs are typically early-type galaxies while SF/SBs are usually later-type galaxies. Passive galaxies are also generally early-type galaxies as indicated by their definition.

Standard image High-resolution image

Table 6. Mean ugriz Absolute Magnitudes and the Standard Deviations of (W4)0.1

Type Mu Mg Mr Mi Mz
AGN −18.9 ± 1.0 −20.3 ± 1.0  −21.0 ± 0.98 −21.3 ± 1.2  −21.6 ± 0.99
Passive −18.5 ± 1.1 −20.0 ± 1.1 −20.6 ± 1.1 −20.9 ± 1.4 −21.3 ± 1.1
SF/SB −17.6 ± 1.1 −18.5 ± 1.1 −18.9 ± 1.1 −19.0 ± 1.4 −19.2 ± 1.1

Download table as:  ASCIITypeset image

6. SUMMARY

This paper is the second paper of the series Detecting Active Galactic Nuclei Using Multi-filter Imaging Data. The purpose of this series is to develop and test techniques for identifying AGNs using photometric information alone, although a modest number of spectroscopically confirmed AGNs are necessary for the training sample.

Because a straightforward aperture photometric measurement requires that each galaxy has the same PSF, and the sharper the better, we devoted the first part of this paper to the introduction of shapelets, an imaging manipulation algorithm that can adjust an image's PSF via image reconstruction. We discussed the limitations of shapelets and how we adapted it to suit our science by developing a two-step shapelets routine.

After applying shapelets to a sample of CFHTLS Wide Survey ugriz images and measured photometric parameters on shapelets-reconstructed images, we incorporated a neural network algorithm, or ANN. This ANN was used to estimate photometric redshift and to assign galaxy morphological types based on existing empirical relations between galaxy SEDs and redshifts or morphological types. Instead of the traditional fitting of analytic functions, an ANN adopts a more sophisticated approach that provides greater flexibility, and therefore, a more secure result. The ANN's pattern recognition ability was used to identify galaxies as AGNs, SF/SBs, or purely passive galaxies. As demonstrated earlier, galaxies with different spectral classes are located at different places on a color–color diagram or a concentration index versus color gradient diagram, albeit with some overlap. AGNs can be more clearly separated from other galaxies in a multi-parameter space, through optimally training a neural net. By combining shapelets to sharpen and adjust the PSF to the same size and an ANN, our technique is able to identify 70% of AGNs or passive galaxies and 90% of SF/SB galaxies with confidence with z ⩽ 0.1. The number of AGNs identified by the ANN using photometry is comparable to the number of AGNs identified by MPA/JHU by spectroscopic means.

This technique provides an alternative to and possibly more powerful method of identifying AGNs or SF/SBs than using photometric information alone. It can be easily adapted to be used on other multi-wavelength image surveys so long as a modest number of galaxies in the survey are confirmed AGNs or SFs/SBs. This technique can also be used to select AGN or SF/SB candidates in future spectroscopic surveys.

M.M.D.R. is grateful for financial support from the Natural Sciences and Engineering Research Council of Canada. This work was based on research carried out by X.Y.D. in fulfillment of a PhD dissertation. The authors thank P. Hall for providing the AGN composite spectra and P. Nair for helpful discussions.

The authors thank Drs. Richard Massey and Alexandre Refregier for making SHAPELETS codes publicly available. They also acknowledge the use of public data from MPA/JHU DR7.

This work is based in part on data products produced at TERAPIX and the Canadian Astronomy Data Centre as part of the CFHTLS, a collaborative project of the National Research Council of Canada and the Institut National des Sciences de l'Univers of the Centre National de la Recherche Scientifique.

This work was also based in part on data products from the SDSS DR7. Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England. The SDSS Web site is http://www.sdss.org/.

The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the American Museum of Natural History, Astrophysical Institute Potsdam, University of Basel, University of Cambridge, Case Western Reserve University, University of Chicago, Drexel University, Fermilab, the Institute for Advanced Study, the Japan Participation Group, Johns Hopkins University, the Joint Institute for Nuclear Astrophysics, the Kavli Institute for Particle Astrophysics and Cosmology, the Korean Scientist Group, the Chinese Academy of Sciences (LAMOST), Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy (MPIA), the Max-Planck-Institute for Astrophysics (MPA), New Mexico State University, Ohio State University, University of Pittsburgh, University of Portsmouth, Princeton University, the United States Naval Observatory, and the University of Washington.

Footnotes

  • Based on the Nyquist criterion, the minimum sampling size is 2 pixels. In this work, we adopted a two-dimensional Gaussian PSF as the optimized PSF. The FWHM for a Gaussian profile is 2.36 pixels across a single axis, or 3.33 pixels across the diagonal of the square pixel. We selected a value of 3 pixels.

  • An image used to store the standard flux error assigned to each pixel.

  • We set the largest allowed nmax value as 20. As mentioned earlier, nmax is limited by the noise of the image and the degree of pixelization. For the data we used, the background noise starts to become evident in the reconstructed image when the order of the shapelets basis functions are higher than 20.

Please wait… references are loading.
10.1088/0004-6256/146/4/87