Machine-learning Approach to Identification of Coronal Holes in Solar Disk Images and Synoptic Maps

Egor Illarionov; Alexander Kosovichev; Andrey Tlatov

doi:10.3847/1538-4357/abb94d

1. Introduction

Solar magnetic fields play a key role in the formation of solar activity tracers that are observed in solar disk images (Solanki et al. 2006). Regions where magnetic field lines are open in the outer space and appear darker in EUV images are called coronal holes (CHs). Direct observation of such structures is a challenging procedure and requires special conditions (Lin et al. 2004). Another option based on a reconstruction of magnetic field lines from solar magnetograms requires additional modeling (see, e.g., Stenflo 2013 for details of observations). There have been long-term and intense debates about proper magnetic field reconstruction and there currently is no single accepted method (see Wiegelmann et al. 2014, 2017, for a review of models and their limitations).

A search for a robust detection procedure for CHs is motivated by at least two aspects. First, due to the open magnetic field line configuration, high-energy particles can easily flow into the outer space and form a solar wind (Nolte et al. 1976; Abramenko et al. 2009; Cranmer 2009; Obridko et al. 2009). The solar wind from CHs can reach the Earth and manifest itself in geomagnetic storms (Robbins et al. 2006; Vršnak et al. 2007). Thus, the detection of CHs is essential for space weather forecasting. Second, in the view of the solar dynamo theory, periods of solar activity minima are associated with a strong poloidal magnetic field (Parker 1955). Thus, observations of polar CHs may provide information about the poloidal field strength and also the upcoming solar cycle (Harvey & Recely 2002). Identification of CHs as open field regions in reconstructed solar magnetic field lines is doable, however, with significant uncertainties (see, e.g., Linker et al. 2017).

Fortunately, CHs have an easily accessible tracer. They appear as massive dark regions when the solar disk is observed in the EUV or X-ray spectrum. The reason for its darker appearance is a lower density and temperature of the solar corona due to the special magnetic field configuration (Priest 2014). Detection of such specific dark regions is convenient for CH identification. We review some common approaches to this problem below.

Detection of CHs is performed both in solar disk images and in solar synoptic (Carrington) maps that are a compilation of successive disk images during a solar rotation period. Methods for CH identification in the disk images are remarkably diverse. They range from fully manual procedures to fully automatic ones and use observations in various wavelengths (Table 1). In addition, source data providers often apply a custom data preprocessing that contributes to disagreements among various identification attempts. A detailed and unbiased analysis of the various approaches and their uncertainties is outside the scope of this research.

Table 1. Input Data Used for CHs Segmentation in Previous Studies

Author	Reference Name	Input Wavelength
Henney & Harvey (2005)	⋯	10830 Å and magnetogram
Scholl & Habbal (2008)	⋯	171, 195, 304 Å and magnetogram
Krista & Gallagher (2009)	⋯	195 Å
Reiss et al. (2014)	⋯	193 Å
Verbeeck et al. (2014)	SPoCA	193 Å or 195 Å or (171 and 195 Å)
Lowder et al. (2017)	⋯	(193 or 195 Å) and magnetogram
Garton et al. (2018)	CHIMERA	171, 193 and 211 Å
Heinemann et al. (2019)	CATCH	193 Å and magnetogram

Download table as: ASCII Typeset image

Further progress in methods for CH identification in disk images will help to reduce uncertainties in the determination of CH boundaries. However, CHs are typically large structures, and a single disk image may reveal only the part of a CH that is on the visible side of the Sun. This means that we need to compile a series of disk images to capture the whole CH region. Solar synoptic maps are a convenient way to create such a representation. A straightforward approach to account for the CHs boundaries in a synoptic map is to include a compilation of the CHs boundaries detected in disk images. This approach was implemented, e.g., by Caplan et al. (2016). We note that this approach may unambiguously work only if all disk images are taken at the same time and cover the whole solar surface. However, CHs evolve and change their shape with time. Even long-living CHs may appear substantially different in the disk images after a single solar rotation. The instantaneous coverage of the whole solar surface was only available during the STEREO observations of the far-side of the Sun.

An alternative approach first suggests merging the solar disk images into full-surface synoptic maps, and then identifying CHs in the synoptic map directly. Of course, we still have uncertainties in pixel intensities; however, it is more convenient to resolve them for continuous values (pixel intensities) than for binary values (CH boundaries). Quite surprisingly we find much fewer recent publications on CH identification in the synoptic maps. Toma & Arge (2005) and Toma (2011) developed a CH identification procedure using synoptic maps in the 171, 195, 284, 304, and 10830 Å spectrum lines along with the Hα and magnetic synoptic maps. The data set and analysis cover a period from 2006 to 2009. Hess Webber et al. (2014) investigated polar coronal holes from 1996 through 2010 and compared the identification of CHs in the disk images with two techniques that identify CHs in the synoptic maps. One method is based on a combination of synoptic maps in the 171, 195, and 304 Å wavelengths, while the second one works with the magnetic synoptic maps. The authors concluded that these methods produced comparable results. An extended time period from 1996 to 2016 was considered by Hamada et al. (2018) who used the multiwavelength synoptic maps together with magnetograms. An important contribution of this paper is the development of a homogenization procedure for data from different observational instruments, which allowed them to perform a joint analysis of two solar cycles (23 and 24).

The previous methods were developed to analyze specifically either disk images or synoptic maps. We did not find any method that has been validated both in solar disk images and synoptic maps. This motivates us to develop a unified procedure that can be applied to various representations of solar observations.

In this paper, we suggest an idea that for a unified detection algorithm there should be no dramatic difference between CHs captured in solar disk images and synoptic maps. Of course, we note that the CHs in the disk images are physical objects while in the synoptic maps they are synthetic objects to some extent. Nevertheless, visual interpretation works similarly in both cases. One can say that the concept of CHs is the same in both representations.

The suggested idea provides some desired properties of the unified algorithm. First, it should be local in the sense that it should be independent on the global image scale and context. For example, it should demonstrate the same output whether we feed a whole solar disk or just a cropped patch with no information about its location in the original disk image. Second, reasonable geometrical transformations should not affect the CH identification, e.g., there should be no difference to which plane the solar sphere is projected,—the concept of CH remains the same.

Analyzing the desired properties we note that if the algorithm acts as a convolution with some local kernel, it can be a proper candidate. Of course, the kernel should be sophisticated enough to provide binary masks of CHs from input images. This is very close to what convolutional neural networks (CNN) do.

The CNN are special types of neural networks commonly used in image analysis. They can be assumed as a set of successive convolutional operations with the kernels that are adjusted during a model training phase to minimize some loss function, e.g., a segmentation error. Once the model is trained, the kernels are fixed and inference in new images can be done. A nice and useful property of such models is that due to their architecture they do not depend on the input image size (the situation is similar to the well-known Gaussian or Sobel filters that can be applied to images of arbitrary shape).

In our research, we apply a CNN trained on segmentation of CHs in solar disk images to solar synoptic maps. We present an algorithm of solar synoptic map construction and demonstrate that the CNN model provides an accurate segmentation output. As a special case, we consider synoptic maps projected onto the northern and southern solar hemispheres (pole-centric projections) and demonstrate that the output of the CNN model is also in agreement with the original synoptic map. The obtained statistics are analyzed with respect to solar activity variations.

2. Data

We analyze a data set of the Solar Dynamics Observatory (SDO) Atmospheric Imaging Assembly (AIA) 193 Å solar disk images with a cadence of one image per day (Lemen et al. 2012). The start date is 2010 June 16, and the end date is 2020 March 1. This period covers 130 full solar rotation periods starting from Carrington rotation (CR) number 2098 to 2227 inclusively. The data set was obtained from the SunInTime⁷ website in JPEG quality and 1K resolution. There are two reasons for this choice. First, this is the same data set that was used by Illarionov & Tlatov (2018) for the CNN model training. In the context of neural network models, the data set uniformity is essential. Second, this data set is already calibrated with respect to any known instrument issues by the instrument team (Lemen et al. 2012). This allows a direct assessment of the input data quality and prevents possible misinterpretation in data preprocessing steps. Based on this data we construct solar synoptic maps as described in the next section.

In the data analysis section, we also use Carrington rotation synoptic charts of the radial magnetic field component⁸ from the Helioseismic and Magnetic Imager (HMI; Scherrer et al. 2012).

Additionally, we use a catalog of filaments⁹ provided by the Kislovodsk Mountain Astronomical Station¹⁰ to make a comparison with CHs identified by the CNN model.

3. Construction of Synoptic Maps

A standard method of synoptic map construction consists of two steps. First, we project the solar disk images onto the Carrington coordinate system. Second, we select latitudinal strips centered at the central meridian and concatenate them within a single solar rotation period. Other catalogs of the SDO/AIA synoptic maps were prepared similarly (e.g., Karna et al. 2014; Caplan et al. 2016; Hamada et al. 2020).

For the construction of the synoptic maps, we use a data set of solar disk images described in Section 2. The disk images have a resolution of 1024 × 1024 pixels; the synoptic maps are calculated with a resolution of 720 × 360 (however, this is a free parameter). First, we map each disk image into the Carrington coordinate system. A technical problem here is how to map pixels of disk images onto synoptic maps. On the one hand, for each pixel in a disk image, one can find a corresponding pixel in the synoptic map using basic trigonometric formulas. The advantage is that we use information from all pixels that cover the solar disk; the disadvantage is that the corresponding pixels of the synoptic map are sparse. The higher the resolution of the synoptic map, the greater its sparsity. On the other hand, one can construct a reverse mapping. The advantage here is that pixels of the synoptic map are dense, however, some pixels of disk image will be ignored and not contribute to the synoptic map. In this case, the higher the resolution of the disk image, the greater the number of pixels ignored in this image. Since we want to keep the resolution of the synoptic maps as a free parameter, we suggest using the mapping of both types and averaging the pixel values that correspond to the same pixel of a synoptic map.

The next step is to select a strip around the central meridian of the projected disk image. It is convenient to consider this step as a part of an averaging procedure, in which we take into account the distance between the pixel longitudes and the central meridian longitude in the contributing disk image. The greater the distance, the smaller the pixel weighting factor. The proposed weighting function is defined as sigmoid((−d + a)/b), where sigmoid(x) = 1/(1 + exp(−x)) is a standard sigmoid function, d is distance in degrees, and a and b are the shift and scale parameters that help to select the desired blending. Indeed, varying these parameters we will obtain wider or narrower rectangular domains and can play with the softness of its borders. As a particular choice in this work, we use the weighting function: sigmoid((−d + 13.2)/2). It approximately specifies that each pixel in the synoptic map is mostly a result of the blending of two nearest disk images. In Section 5 we will demonstrate that the CH detection is stable against the various choice of these parameters. This particular choice was motivated mostly by the visual appearance of the produced synoptic maps. Larger values of the shift parameter result in losing fine structures in the synoptic maps, while smaller values make the transition zones between the successive disk images visible.

The final step is a histogram matching that corrects the brightness and contrast variations in the disk area due to the limb-brightening effect. Because of this effect, a synoptic map constructed from the central meridian strips appears darker than the original disk images. As a result, the pixel intensity distribution is biased. A variety of physics- and data-driven models have been proposed for correction of this effect (see, e.g., Caplan et al. 2016). We apply the most straightforward approach of direct histogram matching. First, we construct a cumulative distribution function (CDF) F₁ for pixel intensities from all contributing projected disk images. Then we construct a CDF F₂ for the synoptic map. We note that if we replace each pixel intensity level p of a synoptic map with ${F}_{1}^{-1}({F}_{2}(p))$ then we obtain a new distribution with CDF equal to F₁ (see, e.g., Gonzalez & Woods 2006, for implementation details). Figure 1 shows the pixel intensity distributions for a sample of synoptic maps before the histogram matching and the distribution of disk projections.

**Figure 1.** Pixel intensity distribution of synoptic maps before histogram matching in comparison to the distribution of contributing disk projections. The histogram matching procedure adjusts the synoptic map to make it similar to disk projections. Carrington rotations: (a) CR 2098, (b) CR 2145, and (c) CR 2219 are shown.
Download figure:
Standard image High-resolution image

In Figure 2 we demonstrate examples of the constructed synoptic maps for the solar activity maximum and minimum. Specifically, for the demonstration in this and in the following figures, we choose three Carrington rotations: CR 2098 (during the solar minimum between Cycles 23 and 24), CR 2145 (during the Cycle 24 maximum), and CR 2229 (during the minimum between Cycles 24 and 25). Note that the synoptic maps during solar minimum tend to be darker. This is even clearer if we average the synoptic maps over longitude and concatenate them in chronological order (Figure 3). Apart from the long-term intensity variations associated with the solar cycle we also find annual variations associated with the solar B0 angle, best seen along a fixed latitude. In our opinion, the latter effect can be related to the emitting plasma that reduces the visibility of polar CHs observed close to the limb. This effect has been discussed in Kirk et al. (2009). The nature of cyclic variations is a matter of a separate investigation. Another point that should be mentioned is the instrument degradation issue. In our research, we employ images provided by the SDO team, which are corrected for the degradation effects, and do not apply additional processing. However, the calibration process is not unique, and one could consider alternative data sets, e.g., the one prepared by Galvez et al. (2019).

**Figure 3.** Concatenation of synoptic maps averaged over longitudes. Green vertical lines mark timestamps corresponding to CR 2098, CR 2145, and CR 2219 shown in Figure 2.
Download figure:
Standard image High-resolution image

To conclude this section we would like to mention that the source code for synoptic maps construction is open-sourced in the GitHub repository https://github.com/observethesun/synoptic_maps, while the synoptic maps produced for each Carrington rotation are available in a catalog https://sun.njit.edu/coronal_holes/.

4. Segmentation Model

We start with a brief description of the neural network model proposed by Illarionov & Tlatov (2018) and discuss how to apply it to the synoptic maps or, generally speaking, to input images of arbitrary shape.

The model is a typical U-Net convolutional model (Ronneberger et al. 2015). Figure 4 schematically shows the model architecture. It consists of two branches. The first branch compresses an input image via a set of convolutional and downsampling operations into a tensor with reduced spatial dimensions but an increased channel dimension. Each downsampling operation reduces the spatial dimensions by a factor of 2, while each convolutional operation increases the number of channels by the same factor of 2. The number of the channels after the first convolutional operation (denoted K in Figure 4) is a parameter of the model. The model we use has K = 24. In total, the compression branch consists of four convolutional-downsampling steps. For example, for an input image of (256, 256) pixels and K = 24 the compression branch will result in a (16, 16, 384) tensor.

The second branch of the model is a decompression branch. It consists of a set of convolutional-upsampling operations that, simply speaking, act as an operation inverse to the compression branch. The output image tensor will have the same dimensions as the input image. Because the compression-branch localization information becomes more and more limited because of the downsamplings, the U-Net architecture includes skip-connections between the corresponding tensors in the compression and decompression branches. This operation stacks a copy of tensors in the compression branch to tensors in the decompression branch. Thus, layers in the decompression branch obtain information from earlier layers with the localization information present. Additional technical details of the implemented model can be found in the original paper (Illarionov & Tlatov 2018). The source code for the model application to synoptic maps is available in the repository https://github.com/observethesun/synoptic_maps.

An important feature of the proposed model architecture is that it is independent of the input image shape. This means that the model can be trained on patches extracted from original images, and then used for the analysis of full-size images. In this work, we apply the model trained on a set of disk images to the synoptic maps and pole-centric projections constructed from these maps.

For the model training Illarionov & Tlatov (2018) used the binary masks of CHs obtained at the Kislovodsk Mountain Astronomical Station. These binary masks separate CH and non-CH regions, including flaments. The binary masks along with other products are contained in daily reports of the station. An archive of solar activity maps, including CH boundaries is available at https://observethesun.com. Thus, the model training represents a semiautomated and manually controlled process of the CH identification applied at the station.

We use the same convolutional kernels and other trainable parameters that were obtained by Illarionov & Tlatov (2018). This means that the presented results can be directly correlated with the previous work.

There are some technical issues that we would like to mention. First, the synoptic maps presented in Section 3 have a spatial resolution of 720 × 360 pixels. The model was trained on the 256 × 256 pixel disk images. Thus, it makes sense to downscale the synoptic maps to better match the pixel sizes. Second, it is recommended to apply a maximal intensity padding to the synoptic maps to avoid some artifacts near the boundaries. The point is that due to the convolutional nature of the model, each pixel of the next layer is connected only with a local group of pixels in the previous layer and thus have a bounded receptive field. It follows that the neurons in the deepest layer classify pixels based on their local surroundings. Pixels near image boundaries have fewer pixels around them in contrast to, e.g., pixels in the image center. In practice, we can see border artifacts in segmentation output. Image padding is a common way to overcome this problem. It can be shown that neurons at the end of the compression branch have a receptive field of 140 × 140 pixels in the input image. Thus, additional padding of about 70 pixels around the synoptic map will provide a full receptive field for pixels near synoptic map boundaries. Note that this action is not required for the solar disk images since the space around the solar disk acts as natural padding. We have tested various approaches, e.g., constant, mean, and reflection padding, and find that the most straightforward maximal intensity constant padding works well. To be more detailed, we downsample the synoptic maps to 360 × 180 pixels and apply the spatial padding to obtain the target size of 512 × 256 pixels. The CNN model applied to the 512 × 256 input images produces the segmentation masks of the same size from which we extract a 360 × 180 region that contains the desired segmentation map for the synoptic map, and is the final output.

Figure 5 shows a sample segmentation map obtained using the CNN model. The model outputs a score for each pixel to be a part of a CH. The score ranges from 0 to 1. We apply a 0.5 thresholding to convert the heatmaps into binary masks. For example, Figure 6 shows that the identified CH boundaries correspond to visual expectation and accurately detects CHs regions. Moreover, we do not find misclassification examples with respect to the catalog of filaments provided by the Kislovodsk Mountain Astronomical Station and shown in blue color in the same plot. In the next section, we provide a detailed analysis.

**Figure 6.** Overlaid synoptic maps and reconstructed CH boundaries (green lines) for (a) CR 2098, (b) CR 2145, and (c) CR 2219. These are the same CRs as in Figure 2. For comparison, filaments from the catalog of the Kislovodsk Mountain Astronomical Station are shown in blue color.
Download figure:
Standard image High-resolution image

To demonstrate an additional application of the CNN model, we apply it to the pole-centric projections of the synoptic maps. The model inference in this case is the same as for the solar disk images. Figure 7 shows sample segmentation maps obtained for the polar projection inputs. For comparison, we put in the same figure pole-centric projections of CHs obtained in synoptic maps. We note that both methods are in good agreement as should be expected.

**Figure 7.** CH boundaries (green lines) identified in the pole-centric input images (color background), in comparison with the pole-centric projections of the CH boundaries deduced from the synoptic maps (blue lines). Columns correspond to the same CRs as in Figure 2. Top and bottom rows show the north and south pole projections.
Download figure:
Standard image High-resolution image

In the Appendix, we discuss a possible interpretation of the segmentation procedure within the CNN model from a physical point of view.

5. Analysis

In this section, we demonstrate that the CH detection method is stable against parameters of the construction of the synoptic maps, and investigate general physical properties of CHs.

The most essential parameter in the synoptic map construction is the strip width (in our notation it is represented by the shift and scale parameters). Indeed, the wider strips result in smoother maps without finer details, while narrower strips preserve details but provide noisier maps. Another point is that due to the limb-brightening effect the strip width also affects the pixel intensity distribution. To avoid this effect we apply the histogram matching procedure as described in Section 3.

For the uncertainty estimation we consider all combinations of values of the shift parameter: {6 fdg 6, 13 fdg 2, 19 fdg 8, 26 fdg 4, 33 fdg 0, 39 fdg 6} and the scale parameter: {0.5, 1, 2, 4}. Note that the extreme cases correspond approximately to the narrowest possible strip (about ±6 fdg 6 around the central meridian with a thin blending zone), and a case where each pixel of the synoptic map results from averaging of six of the nearest disk images. In Figure 8 we show intervals between the smallest and largest total areas obtained for all parameter combinations. Note that the uncertainties are rather negligible. This important point allows us to conclude that the CH regions detected in the synoptic maps do not depend on a particular map compilation, but represent stable and physical structures.

Figure 8 shows the CH areas as a function of time separately for the northern and southern hemispheres as well as for the polar ( $| \theta | \gt 50^\circ$ ) and low-latitude ( $| \theta | \leqslant 50^\circ$ ) zones. Our choice of separating boundary θ = ±50° is consistent with the work of Hess Webber et al. (2014). We take into account the contribution of individual pixels into each of these groups rather than attributing a whole CH based on the location of its center. Thus, pixels from the same CH may contribute to the different groups. We make several observations from the figure. First, large annual variations seen in the middle panel have a clear connection to the variations of the solar B0 angle shown in the upper panel (due to its variations, the north and south poles of the Sun are alternately hidden from the observations). Peaks of both lines in the middle panel correspond to the maximal absolute values of B0 when the north or south poles are best seen. Second, there is an asymmetry between the north and south. We observe the hemispheric asymmetry both in time (the area of the southern polar CHs decreases later and starts to increase earlier than the area of the northern CHs) and in amplitude (the southern polar CHs demonstrate an increasing trend during the solar minimum between Cycles 24 and 25, while the northern CHs do not show this trend). Hess Webber et al. (2014) also demonstrated asymmetries in the polar CHs during the solar minimum between Cycles 23 and 24. Third, from the bottom panel, we find that the solar minimum manifests itself in increasing both the polar and low-latitude areas of CHs. Moreover, while the areas of the polar CHs continue to increase, the low-latitude CH areas fluctuate near constant value. This may be consistent with ideas of the solar flux-transport theory that magnetic fields migrate from low latitudes to the poles and accumulate there during solar minimums (see Babcock 1961; Leighton 1969).

As we noted before, synoptic maps are not directly observable data in contrast to solar disk images. Thus it is interesting to compare CHs identified in disk images and synoptic maps. To make this comparison feasible we apply the CNN model to solar disk images as described in Illarionov & Tlatov (2018) and stack obtained binary masks of CHs into synoptic maps. To construct the synoptic maps from binary CH masks we apply the same procedure as for solar disk images excluding the histogram matching step. Figure 9 shows the total area of CHs identified in solar disk images and stacked into a synoptic map in comparison to CHs identified directly in synoptic maps. In should be noted that production of synoptic maps from binary masks is much more sensible to the shift and scale parameters in comparison to production of synoptic maps from disk images. The point is that the narrower the strips are, the more noisy the map we obtain is. We set the shift parameter to 39 fdg 6 and scale to 4 to ensure that only stable structures identified in disk images contribute to synoptic maps. We find that this choice of parameters provides the best correlation with CHs identified directly in synoptic maps. Thus we conclude from Figure 9 that CH identification in solar disk images and synoptic maps is in agreement.

**Figure 9.** Total area of CHs identified in synoptic maps (red line) in comparison to CHs stacked into synoptic maps from CHs identified in solar disk images (blue line).
Download figure:
Standard image High-resolution image

Now we consider synoptic maps of CHs with respect to magnetic synoptic maps and construct time-latitude and time-longitude diagrams. We start with the time-latitude diagram that shows a ratio of total unsigned magnetic flux in CHs to the total unsigned magnetic flux integrated over all longitudes (Figure 10). We conclude from this plot that while the solar minimum is accompanied by an increase of the low-latitude CH areas (see Figure 8, lower panel), its contribution to the total unsigned flux is not dominant. In contrast, polar CHs generate almost the whole unsigned magnetic flux. Note that for construction of this plot we thresholded unsigned magnetic synoptic maps at 10 Gauss to avoid noise contribution.

**Figure 10.** Ratio of total unsigned magnetic flux in CHs to the total unsigned magnetic flux integrated over all longitudes.
Download figure:
Standard image High-resolution image

For a more detailed investigation, we take into account the sign of the magnetic field. In Figure 11, the grayscale background is a magnetic field averaged over all longitudes while blue and red colors show the magnetic field averaged over longitudes only in CH regions. Note that averaging CH magnetic field we filter out latitudes where CHs cover less than 20° of longitudes in total to prevent plotting of statistically insignificant values. We find from this plot that polar latitudes have a prevalent sign of the magnetic field that is opposite in North and South and between solar cycles. Also in agreement with Figure 10, we find that CHs at lower latitudes in the minimum between Cycles 24 and 25 have significantly lower magnetic fields in contrast to polar CHs.

A detailed investigation of results presented in Figure 11 can give insights about the origin of the CH open magnetic flux and its relation to the flux-transport mechanism. For example, Golubeva & Mordvinov (2017) associated CHs with decaying complexes of magnetic activity, while the studies of Tlatov et al. (2014) and Huang et al. (2017) revealed pole-to-pole open flux migration. Hamada et al. (2018) presented a similar plot showing dominant polarity and relative areas of CHs for Cycles 23 and 24. To facilitate we have constructed the CH catalog and made it publicly available.

Finally, we demonstrate time-longitude diagrams of the CH magnetic fields. Panels in Figure 12 correspond to three regions located at northern polar latitudes, low latitudes, and southern polar latitudes. The separating boundaries are θ = ±50° as in Figure 8. We observe that CH patterns are substantially different in the high- and low-latitude regions. At high latitudes, we find large-scale structures that exist for about a year. This indicates that CHs form stable sector structures in the magnetic field distribution. In the low-latitude region, we find a mixture of two populations. Before 2015 (during the solar maximum) one can observe small-scale structures that exist for several months. After 2015 (during the solar minimum) we find characteristics strip structures that can be traced for several years. A final remark from Figure 12 is about the inclination of the structures across all there panels. The elongation from the bottom right to the top left (which we see in the high-latitude zones) means that the region rotates slower than the Carrington coordinate system. In contrast, opposite elongation at the low latitudes means faster rotation. This is consistent with the general picture of the differential rotation of the Sun. However, a detailed analysis and rotation speed estimation is out of the scope of this paper.

$| \theta | \leqslant 50^\circ $ — **Figure 12.** Time-longitude diagrams of CH magnetic fields (shown in blue and red colors) in three latitudinal zones. Panel (a) is for high latitudes in the northern hemisphere (θ > 50°), panel (b) is for lower latitudes ( $| \theta | \leqslant 50^\circ$ ), and panel (c) is for high latitudes in the southern hemisphere (θ < −50°). The grayscale background shows the magnetic field averaged over latitudes for each latitudinal zone. Note that neutral color in the red–blue color bar is not white but transparent so that weak CH magnetic fields are not visible in the plot.
Download figure:
Standard image High-resolution image

6. Conclusions

We have demonstrated that a convolutional neural network (CNN) model trained to identify CHs in the solar disk images is capable of detecting CHs in the solar synoptic maps without any additional adjustments. Being composed of only convolutional operations the CNN processes images of any shape in the same way. This also implies that the local image content dominates over the global content (i.e., the segmentation result will be the same for portions of the image and the whole image). Due to these facts, one can expect that for CNN it should be the same whether it sees the whole disk image, a partial disk image, or a synoptic map (we suppose that human interpretation acts similarly).

To illustrate this idea, we constructed a data set of synoptic maps from daily solar disk images used for model training. The process of synoptic map construction is not unique and contains free parameters. We have shown that the segmentation procedure is stable for a wide range of parameter values (Figure 8).

It is not trivial to compare properties of CHs identified in the disk images and synoptic maps, because it requires a construction of the binary synoptic maps from binary segmentation masks of the disk images. However, there is a more feasible option. One can build pole-centric projections of the synoptic maps, make a segmentation using the CNN model, and compare the output with the pole-centric projections of the binary synoptic maps. For a proper segmentation model, the results should be in agreement. Indeed, in Figure 7 we find that the CH boundaries obtained by both methods are very close. Thus, we conclude that the CNN model recognizes CHs regardless of the way we project them. In other words, it learns what a CH is itself rather than how a CH looks in the solar disk context.

For our initial investigation of the physical properties of the CHs, we separated them into polar and low-latitudinal, and also into northern and southern. In Figure 8 we find that the CH areas are minimal during the solar maximum and start to increase during the declining phase of the solar cycle. There are visible asymmetries between the north and south both in the temporal behavior and in the magnitude of CH areas.

Finally, in Figures 10–12 we demonstrated magnetic field patterns associated with CHs in the time-latitude and time-longitude domains. In Figure 11, we compare the CH patterns with the longitudinally averaged magnetic synoptic maps (the so-called magnetic "butterfly" diagram). The magnetic butterfly diagram reveals the transport of magnetic flux of decaying active regions from the low- and mid-latitudes to the polar regions. As we mentioned above, it was previously suggested that the CHs are formed at high latitudes from the magnetic field associated with the flux-transport events. Figure 11 shows that this association is not common. In some cases, e.g., in the southern hemisphere around 2015, 2016, and 2017 we can see the association of CHs of negative polarity with the negative flux transport. In particular, in the southern hemisphere the most prominent zone of CH formation, which was around 2015, partially overlaps with a major flux-transport event (Figure 11). In the northern hemisphere, the CH activity was a year later and lasted longer, in 2016–17. It is unclear whether the CH activity is associated with the flux-transport events. This zone was also compact in the Carrington longitude, located around 240–300 degrees (Figure 12b). A major complex of activity was in the zone, but a year earlier. On the other hand, CHs of the southern (negative) polarity were more scattered in longitude. Perhaps, as shown in a case study by Benevolenskaya (2012), CHs can also be associated with magnetic fields emerging at high latitudes from the subphotospheric layers. The relationship between the CHs and magnetic flux emergence and transport requires further detailed investigation.

Thus, our research demonstrates that CNN is a powerful and flexible tool for the investigation of solar activity. In particular, it enables a unified approach to the identification and characterization of CHs in various geometrical representations of solar image data. To make this approach more readily available, we open-sourced the code for synoptic map construction and CHs segmentation in the repository https://github.com/observethesun/synoptic_maps and opened free access to CHs synoptic maps in the catalog https://sun.njit.edu/coronal_holes/ available in FITS and JPEG formats.

We thank the reviewer for valuable comments and suggestions.

The work was partially supported by RSF grant 20-72-00106, RFBR grant 18-02-00098-a, NSF grants 1639683, 1743321, 1927578, and NASA grants 80NSSC19K0630, 80NSSC20K0302, and 80NSSC20K0602.

Appendix:

Here we provide some insights about how the proposed CNN model works. We stress that this discussion is only an interpretation rather than an explanation. Nevertheless, it helps to reveal a physical basis for the produced segmentation maps.

A typical alternative to the CNN segmentation is a threshold-based segmentation. The most straightforward approach is to select some threshold level for pixel intensities and declare everything beyond this level to be a CH. It is interesting to investigate to what extent the CNN model is more advanced. In our experiments, we consider several synoptic maps (same as in Figure 2) and determine the threshold levels that result in the same number of pixels corresponding to CHs as in the segmentation masks from the CNN model. We stress that we only match pixel counts while finding the thresholds. In Figure 13 we demonstrate the input synoptic maps, the CH segmentation maps produced by the CNN model, and equivalent (in the sense of the CH pixel counts) segmentation maps produced by the thresholding. The histograms show pixel intensity distributions in the synoptic maps and the threshold levels. We make several observations from these plots. The CNN segmentation maps look less noisy compared to the threshold-based segmentation. This means that the CNN acts not as a thresholding procedure but includes some high-level processing. The second and more important observation is that the equivalent threshold in the CNN segmentation varies from image to image. This means that it depends on the image context. However, from a physical point of view, the most interesting note is that the equivalent threshold corresponds to the first minimum in the intensity distribution. Most of the CH segmentation algorithms proposed earlier rely on this idea more or less explicitly. In this respect, the CNN model automatically finds this more or less to be a reasonable and intuitive strategy.

Now we want to take a step deeper and consider some synthetic cases. We noted in Figure 13 that while being equivalent in terms of the CH pixel counts to the thresholding procedure, the CNN segmentation masks are not as noisy as the threshold-based ones. To investigate this fact in more detail, we generate a set of synthetic synoptic maps as Gaussian random structures with the radial exponential correlation function $K(r)=\exp (-r/{r}_{0})$ . Here r is the distance between pixels in the pixel units, and r₀ is a correlation radius. Varying r₀ we obtain a set of synthetic maps ranging from the maps with almost uncorrelated noise for small r₀ to the maps with large-scale correlated random structures for large r₀. For each map, we apply the histogram matching procedure and make its distribution similar to the solar synoptic map corresponding to CR 2219 (see the right column in Figure 13). Thus, for the threshold-based approach, each synthetic map contains the same number of pixels assigned to CHs (the threshold is also the same as for the synoptic map corresponding to CR 2219). Our goal is to compare this against the CNN model. In fact, we vary r₀ from 0.01 to 20 and use a sample of 10 synthetic maps for each r₀. Figure 14 shows a sample of the synthetic map for various r₀ and the corresponding segmentation maps.

We note in Figure 14 that both segmentation methods give mostly similar results for large-scale structures, but substantially differ for small-scale structures. This is also a reasonable feature of the CNN model trained for the CH segmentation. Indeed, CHs are typically large-scale structures so a proper model should take into account the size factor. While for a typical CH segmentation method a region filtering procedure is an explicit part of the algorithm, for the CNN model this step works automatically. Figure 15 demonstrates the number of pixels labeled as CHs against the scale factor (or the correlation radius r₀ in our notations). Note that for the threshold-based segmentation the pixel count is a constant because each synthetic synoptic map has the same intensity distribution.

**Figure 15.** CH pixel counts for the CNN model (blue line) and the thresholding method (orange line). Horizontal axis shows the correlation radius r₀ used for synthetic synoptic map sampling. Gray color shows a min–max range within 10 samples.
Download figure:
Standard image High-resolution image

Machine-learning Approach to Identification of Coronal Holes in Solar Disk Images and Synoptic Maps

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Data

3. Construction of Synoptic Maps

4. Segmentation Model

5. Analysis

6. Conclusions

Appendix:

Footnotes

Machine-learning Approach to Identification of Coronal Holes in Solar Disk Images and Synoptic Maps

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Data

3. Construction of Synoptic Maps

4. Segmentation Model

5. Analysis

6. Conclusions

Appendix:

Footnotes