Detection of Solar Filaments Using Suncharts from Kodaikanal Solar Observatory Archive Employing a Clustering Approach

Aditya Priyadarshi; Manjunath Hegde; Bibhuti Kumar Jha; Subhamoy Chatterjee; Sudip Mandal; Mayukh Chowdhury; Dipankar Banerjee

doi:10.3847/1538-4357/acaefb

1. Introduction

Solar features such as sunspots, filaments, and plages play a crucial role in our understanding of solar magnetism and associated variability. Soon after the discovery of the telescope in early 1600, various observatories as well as few individual observers across the world started documenting some of these solar features, primarily through drawings (Arlt 2008; Usoskin et al. 2009; Senthamizh Pavai et al. 2015; Arlt & Vaquero 2020; Carrasco et al. 2020). These drawings capture the past behavior of the Sun and help us to expand the observational baseline in conjunction with more recently made observations that are available through photographic plates and CCD sensors. The hand-drawn solar charts come from a variety of sources across the globe, such as the Royal Observatory of Belgium (ROB), the Mount–Wilson Observatory (MWO), the Specola Solare Ticinese (SST) in Switzerland, and the National Astronomical Observatory of Japan, to name a few. Among these, ROB hosts the most comprehensive sunspot drawing series, spanning from 1940 to 2011. Other archives such as from Meudon, the McIntosh archive, and Kislovodsk solar station, have full-disk or synoptic maps with multiple hand-drawn solar features for several solar cycles and are available in digital format. Table 1 provides a comprehensive list of these archives with their respective observation periods. Unique among all of these databases, KoSO has been systematically observing the Sun since 1904 and, most importantly, in three different wavelengths: white light, Ca ii K (393.37 nm), and Hα (656.3 nm). These observations are recorded in photographic plates/films⁷ and simultaneously in suncharts. These suncharts are one-of-a-kind data as they combine full-disk sketches of various solar features such as sunspots, plages, and filaments/prominences for each day of observation beginning in 1904.

Table 1. Solar Drawings Available from Different Observatories across the Globe

Observatory	Solar Features on the Drawing	Duration	Projection	Condition	Digitization Status
Gustav Sporer	Sunspot	1861–1894	⋯	⋯	1861–1894
MWO	Sunspot	1913–2017	Partial disk	Good	1913–2017
ROB	Sunspot	1940–2011	Full disk	Excellent	1940–2011
SST	Sunspot	1981–2017	Full disk	Fair	1981–2017
KoSO	Sunspot, filaments, plages, prominences	1904–2020	Full disk	Excellent	1954–1976
Meudon	Sunspot, filaments, polarity inversion lines (PILs), plages, prominences	1919–	Synoptic maps	Excellent	1919–
McIntosh archive	Sunspot, PILs, filaments, plages, prominences	1967–	Synoptic maps	Excellent	1967–
Kislovodsk solar station	Sunspot, PILs, filaments, plages, prominences	1979–2021	Full disk and synoptic maps	Excellent	1979–2021

Download table as: ASCII Typeset image

In this paper, we present the detection of solar filaments from KoSO suncharts. In this context, filaments are elongated dark structures against the bright solar disk observed best in H- α and He II wavelengths. It is well established that the location of the filaments shows a good correlation with the regions of large-scale magnetic fields in the Sun (Tlatov et al. 2016). Furthermore, filaments are aligned with magnetic neutral lines and hence, are ideal candidates to study the large-scale concentrations of weaker magnetic fields on the solar disk (McIntosh 1972; Low 1982; Makarov & Sivaraman 1983). Full-disk magnetograms are available only after the 1970 s, hence, several attempts have been made to generate/recreate the magnetic field, called pseudomagnetograms, for the previous cycles (before 1970) using indirect proxies of the magnetic field, such as Ca ii K observations (Pevtsov et al. 2016; Mordvinov et al. 2020; Shin et al. 2020; Chatzistergos et al. 2019). Since Ca ii K intensity only provides the strength of the magnetic field, hence, by combining the observations of solar filaments, which trace magnetic neutral lines, polarities of magnetic field can be identified while reconstructing the pseudomagnetogram (Mordvinov et al. 2020).

Previous attempts of detecting filaments can be classified into two broad categories: (i) use of different image processing techniques in an automatic/semiautomatic way (Gao et al. 2002; Fuller et al. 2005; Yuan et al. 2011; Hao et al. 2013; Chatterjee et al. 2017; Tlatova et al. 2017; Mazumder et al. 2021) and (ii) use of machine-learning techniques (Zhu et al. 2019). Given that the solar images in most historical archives suffer from inherent image artifacts (such as dust marks, scratches, etc.), feature detection via traditional image processing techniques tends to produce erroneous results. In this work, we implement the K-means clustering algorithm to automatically extract solar filaments from KoSO suncharts between 1954 and 1976 (23 yr). This paper also describes the novel features of KoSO suncharts and the process of digitizing them.

2. Data

In this work, we use KoSO suncharts covering 23 yr between 1954 and 1976. Below we outline some of the novel features of these suncharts: Each sunchart (see Figure 1(a)) contains several solar features, such as sunspots (from white-light observations), filaments (Hα observations), plages, and prominences (Ca ii K observations) on it. Moreover, all suncharts also have a Stonyhurst latitude and longitude grid (of 5° in size) etched on them. To draw these features, observers used the KoSO photographic plate/film image(s) of that day as a reference. As seen in Figure 1(a), every solar feature in these suncharts has a specific color. For example, sunspots are drawn in black, filaments are in red, and plages and prominences are in blue. Furthermore, information such as the position of the solar north (P-angle) and the heliographic latitude of the center of the solar disk (B-angle), along with the time of observation, are written on the top of every chart. Lastly, if there were no observations available from KoSO on a particular day, the observers used data from Meudon and Mt. Wilson Observatory to populate that sunchart. In that case, such features were then drawn with a different color (e.g., filaments were drawn in green as opposed to their usual red color) to make it easier to spot such observations. Further information regarding these suncharts and their features can be found in Ravindra et al. (2020).

**Figure 1.** (a) A representative image of hand-drawn sunchart made at Kodaikanal Solar Observatory on 1957 January 2. Sunspots are highlighted in black; different shades denote the umbra and penumbra. The plages are denoted by sky-blue outlines, whereas the filaments are denoted by red. (b) The RGB histogram representation for the digitized sunchart. (c) Yearly count of digitized KoSO suncharts.
Download figure:
Standard image High-resolution image

2.1. Digitization of Suncharts

Even though KoSO suncharts cover a period of over 100 yr, as a first step, we only digitize 23 yr data between 1954 and 1976. The reason behind choosing this period is that it covers cycle 19 and cycle 20, which are the strongest and one of the weakest activity cycles in the last century. Here is a brief account of the digitization procedure. A Canon EOS 800D⁸ camera with 22.3 mm × 14.9 mm CMOS sensor is used for digitizing suncharts, and the digitized images are stored in the ".tiff" (tagged image file) format. These digitized images have a bit depth of 24 bits (8 bits × three channels) with sizes of approximately 3500 × 3500. Figure 1(b) shows the intensity distributions of a digitized image (in all three channels: red, green, and blue), which gives the idea about the sensitivity of the camera sensor for all three colors. The RGB histogram confirms that it can distinguish between different color spaces.

In Figure 1(c), we present the statistics of suncharts that have been digitized until now. Furthermore, the figure also highlights the data gap, primarily due to inclement weather at the site, resulting in no observations. Lastly, we mention here that while preparing Figure 1(c) as well as the analysis that is to follow, we do not include those suncharts in which the Hα observations were taken from observatories other than KoSO (as mentioned earlier, in those suncharts filaments are sketched with a different color instead of the usual red).

3. Method

3.1. Identification of Solar Disk

The first step toward filament detection is to identify the solar limb in each of these images. To this end, we detect the nearly vertical line in the grid by using the linear Hough transform method (Hough 1962). The radius and center of the disk are calculated as half of the length of the detected line and its bisection point, respectively. The presence of multiple latitudinal and longitudinal grids, in addition to various overlying markings and features as indicated in Figure 1(a), makes this process of solar limb detection to be a nontrivial one. As a result, before applying the linear Hough transform, we must "clean" the image to remove any undesired features or artifacts, and the steps that we adopted for that are as follows:

1.
First, we process the original image (Figure 2(a)) using the Canny edge detection algorithm⁹ , with a lower and upper threshold of 0.8 and 0.9, respectively. Next, to get rid of those little fragmented structures that the Canny operator often returns, we employ a morph closing operation with a square kernel of the size of 10 pixels. This step produces an image as shown in Figure 2(b).
2.
Since our interest is in extracting the grid as well as the vertical line, we select the largest connected region in Figure 2(b), which is the grid. The inverted image after this step is shown in Figure 2(c).
3.
Before proceeding to the next step of line identification, we must eliminate the thin extended portions at both ends of the vertical line, which otherwise will lead to overestimating the length. We use a horizontal erosion function with a kernel size of 30 pixels to remove such tentacles and then a horizontal morph-close operation with a kernel size of 250 pixels to reconnect the regions. The image is now ready for the linear Hough transform to apply, and in doing so, we determine the line's length, bisection point, and also the slope. These values correspond to the disk's diameter, center, and orientation in the sunchart, respectively (Figure 2(d)).

**Figure 2.** (a)–(d) Numerous steps to detect the solar disk. (a) A sample drawing from 1957 January 2 (same as Figure 1). (b) Output image after applying Canny edge detection and using the morph-close operation. (c) shows the largest selected contour, representing the grid from suncharts. Lastly, (d) shows the detection of a polar line by Hough transform and detected circle overplotted over the suncharts grid.
Download figure:
Standard image High-resolution image

Lastly, although the aforementioned technique works for most suncharts, there are few cases (4%) where the artifacts are so dominant that our automated limb detection technique does not work and thus, we process those images manually.

3.2. Identification of Solar Filaments

We first isolate the disk using the limb information as outlined in Section 3.1. Thus, at this stage, our input image looks similar to the one shown in Figure 2(c). In order to detect the filaments automatically, we employ a clustering algorithm known as K-means clustering (MacQueen 1967; Lloyd 1982). The success of this approach lies in identifying the optimal value of K, which describes the number of clusters present in the data. The steps for obtaining this K value are as follows:

1.
We first considered all K values between 5 and 30 and visually monitored the output in each run (Figures 3(a)–(e)). Based on these visual inspections on 100 randomly selected suncharts, we find that K = 20 provides the best results. To cross-check this conclusion, we further calculate a quantity known as compactness (a measure of the sum of squared distances between each point and their related centers) and plot it against K as shown in Figure 3(f). We find that the compactness curve initially decreases monotonically with increasing K and then starts to flatten out past K = 20. Therefore, it confirms our initial finding of K = 20 being the optimal K value. Although the compactness decreases ever so slightly for higher K values (for example, K = 30), the computation time increases significantly without much improvement in the final output. Hence, for efficient and effective filament detection, we set K to 20.
2.
The next step is to calculate the mean RGB values (μ_R, μ_G, μ_B) of the same set of images (as selected in step 1) for the cluster that best represents the filaments. In addition to the mean, we calculate the mean absolute deviation (σ_R, σ_G, σ_B) in each of the three channels, which offers a range of three RGB values. These ranges are: R = 190 ± 20 and B = 115 ± 10. We see that the G value does not fluctuate from image to image, which is understandable given that filaments are highlighted in red, so we do not use that channel in our case.
3.
We apply this range to the R (μ_R ± σ_R) and B (μ_B ± σ_B) channels and create a binary mask with the same resolution as the cropped sunchart. We denote the filament regions in the mask as one and the rest as zero. An inverted (for better visualization) binary image along with overplotted contours on the sunchart is included in Appendix A.

**Figure 3.** The effect of a number of clusters (K) for K-means clustering in filament detection. (a) and (b) show the detection of filaments along with other solar features for K = 5 and K = 10, respectively. (c) shows filament detection with underestimation of the whole filament for K = 15. (d) and (e) depict accurate detection of filaments for K = 20, 25 with hardly any difference in visual appearance. (f) shows the quantitative estimate of detected filament localization (compactness) as a function of K.
Download figure:
Standard image High-resolution image

Upon running the abovementioned procedure on the KoSO data between 1954 and 1976 (that include 6594 suncharts), we detected a total of 66,722 filaments on them.

4. Result

4.1. Statistical Properties: Time–Latitude Diagram

Different panels of Figure 4 depict the temporal evolution of filament latitudes across the two solar cycles we studied here (cycles 19 and 20). In Figure 4(a), we show the well-known 'Butterfly diagram' but through a 2D histogram. The histogram is produced over the time (bin size =0.25 yr, i.e., 3 months) and latitude (bin size =3°) bins. The strength of the color represents the number of filaments identified in that particular time–latitude bin. Through this plot, we find that although filaments seem to appear across the entire latitude band, their distribution practically shows a strong cycle dependence. For example, clear signatures of equatorward and poleward migration of filaments are seen during the beginning the cycles. Furthermore, the equatorward branches are observed to be spread over a larger band of latitude as compared to the sunspots (Mandal et al. 2017; Jha et al. 2022). These findings are consistent with ones found previously with digitized H α and Ca K photographic plates of KoSO (Chatterjee et al. 2017, 2020).

Figure 4(b) adds further information to the time–latitude diagram of filaments by grouping their deprojected areas (in μHem) into different bins and depicting them in different colors. The area ranges considered are: Area > 1000 μHem, 500 μHem < Area < 1000 μHem , 100 μHem < Area < 500 μHem, and Area < 100 μHem. We find that the filaments with smaller areas are restricted to lower latitudes (<± 35), wherein the bigger ones appear at the higher end. In fact, the poleward branches are mostly dominated by such large area filaments. These findings are further confirmed through the right panel of Figure 4(b), in which we plot the latitudinal distributions of filaments in various area groups collapsing the temporal information.

Lastly, Figure 4(c) presents a time–latitude distribution in which we grouped the filaments according to their lengths. Different length ranges that we considered are: length > 500 Mm, 100 Mm < length < 500 Mm, 10 Mm < length < 100 Mm, and length < 10 Mm. To determine the length of a filament in the first place, we start by retrieving the coordinates of a filament's border pixels. Following that, the individual pixel-to-pixel distance (in megameters) are calculated using the respective pixels' latitude and longitude over the spherical surface, and their sum yields the filament perimeter.¹⁰ Filament length is measured as half of this perimeter (Mazumder et al. 2018). Like Figure 4(a), the poleward branches in Figure 4(c) are also dominated by the longer filaments, wherein the shorter ones are restricted to low latitudes.

4.2. Tilts of Filaments

Filaments are typically found along the magnetic polarity inversion lines, and hence, their tilts can be used as a proxy of active region (AR) tilts. Moreover, quantification of AR tilt is essential from the point of view of solar dynamo theory in which this tilt plays a significant role in converting the toroidal field into a poloidal field (Babcock 1961; Leighton 1964; Choudhuri 2003; Charbonneau 2020). We figure out how much each filament is tilted by using the least-square fit on the spine of each filament. We used the chi-square minimization method to fit a straight line, and then we found the tilt with respect to the Sun's equator. Figure 4(d) shows the time–latitude distribution of detected filaments color coded according to their tilt. Filaments whose spines are oriented counterclockwise w.r.t. the equator are considered to have positive tilts (highlighted in blue), and the ones that are aligned clockwise are assigned negative tilts (highlighted in orange). From the figure, it is evident that negative tilts dominate the northern hemisphere, whereas positively tilted filaments dominate the southern hemisphere, consistent with the findings of Mazumder et al. (2021) and Tlatov et al. (2016). This behavior can also be seen quantitatively in right panel of Figure 4(d).

4.3. Polar Rush

All the butterfly diagrams in Figure 4 show a common feature that is: during the early phase of a cycle, filaments at higher latitudes (in each hemisphere) migrate toward the pole, which is known as the "polar rush" (Ananthakrishnan 1952). In Figure 4(a), we highlight these polar branches through ellipses (B1, B2, B3, and B4). Data points within these ellipses were then retrieved, and a linear fit was applied to estimate the drift rate. Table 2 outlines the calculated drift rates for each cycle. Our findings indicate that the northern polar filaments' migration starts before the southern ones' migration. Furthermore, we see a clear North–South asymmetry in migration rates. Furthermore, the drift rates that we obtained from the suncharts match closely with the ones presented in Xu et al. (2021) for cycles 19 and 20 (Table 2).

Table 2. Drift Rate Obtained by Polar Rush Fittings from the Butterfly Diagram

Index	(°/yr)	(°/CR)	(°/CR) ^a
Cycle 19 N (B1)	7.49	0.55	0.51
Cycle 19 S (B3)	10.18	0.76	0.88
Cycle 20 N (B2)	5.99	0.44	0.29
Cycle 20 S (B4)	6.31	0.47	0.39

Note.

^aXu et al. (2021). The respective branch names are denoted in Figure 4(a). Drift rates are mentioned with their respective units.

Download table as: ASCII Typeset image

4.4. Comparisons with H α Carrington Maps

In an effort to perform a direct comparison of our results with that of Chatterjee et al. (2017), who identified filaments using digitized photographic plates of H α observations from Kodaikanal, we also generate Carrington maps using the sunchart data. Figure 5 presents one such comparison. Through these images, we note that many more filaments are detected using the suncharts compared to the H α maps. However, we also note instances of oversegmentation in Figure 5(c). In summary, we find a good match between the detections made using the suncharts and H α images. Lastly, through the figure in Appendix B, we present a different aspect of KoSO suncharts, i.e., a data set that can also be used as a resource to fill the data gaps in existing catalogs around the globe.

**Figure 5.** (a) and (b) show the Carrington map generated from the Hα plates for CR1371. In (a), the superimposed red contours represent the filaments from the Carrington binary mask generated using suncharts, whereas in (b), the blue contours reflect the filaments from the Carrington binary mask obtained from H α plates. The (c) shows the overlap of binary masks derived from both sources (suncharts in red and Hα plates in blue).
Download figure:
Standard image High-resolution image

**Figure 6.** Detection of filament with optimal K value. (a) A representative image from suncharts with detected filaments, (b) red contours of detected filaments overlaid on original sunchart.
Download figure:
Standard image High-resolution image

**Figure 7.** Similar as Figure 5 but for the Carrington rotation 1373. In panel (b), black patches indicate the missing data. This figure highlights the usefulness of drawings to fill the data gap.
Download figure:
Standard image High-resolution image

5. Conclusions

In this article, we present, for the first time, the digitized version of the KoSO suncharts that contain multiple solar features such as filaments, plages, sunspots, and prominences within one drawing. Our main findings are the following.

1.
We devise a novel automatic method to calibrate the sunchart data through disk detection and p-angle correction.
2.
We implement the K-means clustering technique to get the optimum threshold, which automatically detects solar filaments from each sunchart over two solar cycles.
3.
We find a clear "rush to the poles" signature through the time–latitude distribution of filaments. We find a close match of the poleward drift rate, an important parameter to understand the polar field buildup process, with available studies from KoSO digitized H α plates for overlapping cycles.
4.
Latitudinal distributions of filaments' length, area, and tilt angle from our study show close validation with those from studies utilizing other hand-drawn synoptic map archives such as the McIntosh archive and Meudon archive.

Although our automated filament detection procedure works well, for the most part, we have identified some drawbacks too. For example, we had some false detections due to the presence of red patches (likely accidental pen marks) on the images. Moreover, discoloration due to fading sometime results in fragmented filaments. To mitigate these problems, as a future work, we plan to use filament masks produced from this study as ground truth and build a training set to perform supervised machine learning (e.g., convolutional neural networks) to detect filaments. Through this, we can also tackle the cases in which filaments are indicated by green color (obtained from other observatories).

Our present study, corresponding to two solar cycles, highlights the importance of this independent long-term archive of hand-drawn suncharts. The uniqueness of these suncharts is that they bridge the gap left by damage (fungus, broken plates, etc.) on Kodaikanal plate data, allowing us to generate a uniform data series. In near future we expect to provide the full digitized version of the suncharts for almost one century to the solar community.

We thank all the observers who have been involved in observations and making sketches at Kodaikanal for their contributions to building this enormous resource over the last 100 yr. We also thank the Department of Science and Technology (DST) for the project grant (DST/ICPS/CLUSTER/Data Science/2018/General/Sl. No.18), which made this digitization possible.

Software: IDL, Python, Open-CV, Numpy, Pandas.

Appendix A: Filament Detection

Figure 6 depicts the detected filaments binary map and filament contour overplotted on the drawing, respectively. This validates our disk detection method, as well as filament detection.

Appendix B: Additional Carrington Map

Figure 7 highlights one of the benefits of hand-drawn charts over H α plates. The data gaps observed in the H α Carrington maps from plates can be filled from the suncharts.

Detection of Solar Filaments Using Suncharts from Kodaikanal Solar Observatory Archive Employing a Clustering Approach

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction