A Machine Learning Dataset Prepared From the NASA Solar Dynamics Observatory Mission

In this paper we present a curated dataset from the NASA Solar Dynamics Observatory (SDO) mission in a format suitable for machine learning research. Beginning from level 1 scientific products we have processed various instrumental corrections, downsampled to manageable spatial and temporal resolutions, and synchronized observations spatially and temporally. We illustrate the use of this dataset with two example applications: forecasting future EVE irradiance from present EVE irradiance and translating HMI observations into AIA observations. For each application we provide metrics and baselines for future model comparison. We anticipate this curated dataset will facilitate machine learning research in heliophysics and the physical sciences generally, increasing the scientific return of the SDO mission. This work is a direct result of the 2018 NASA Frontier Development Laboratory Program. Please see the appendix for access to the dataset.

1. INTRODUCTION Launched in 2010, NASA's Solar Dynamics Observatory (SDO; Pesnell et al. 2012) has been continuously monitoring the Sun's activity and delivering valuable scientific data for heliophysics researchers with the use of three instruments: • The Atmospheric Imaging Assembly (AIA Lemen et al. 2012) which captures 4096 × 4096 resolution images (with 0.6 arcsec pixel size) of the full Sun in two ultraviolet (UV; centered at 1600 & 1700 A) wavelength bands, seven Extreme Ultraviolet (EUV) wavelength bands (centered at 94, 131, 171, 193, 211, 304 and 335Å) and one visible wavelength (centered at 4500Å).
• The Helioseismic and Magnetic Imager (HMI Schou et al. 2012) captures visible wavelength filtergrams of the full Sun at 4096 × 4096 reso-lution (pixel size of 0.5 arcsec), which are then processed into a number of products, including photospheric dopplergrams, line-of-sight magnetograms and vector magnetograms (Hoeksema et al. 2014).
• The EUV Variability Experiment (EVE; Woods et al. 2012) monitors the solar EUV spectral irradiance from 1 to 1050Å. This is done by utilizing multiple EUV Grating Spectrographs (MEGS) which disperse EUV light from the full disk of the Sun and its corona onto a 1024 x 2048 CCD.
Calibrated level 1 scientific data from the AIA and HMI instruments are accessible from the Joint Science Operations Center 1 (JSOC) at Stanford University, Lockheed Martin Solar & Astrophysics Laboratory, and affiliate science data centers; while science data from the EVE instrument are accessible from the EVE Science Operations Center 2 at the Laboratory for Atmo-spheric and Space Physics (LASP) at the University of Colorado, Boulder.
The SDO mission has been scientifically prolific. In the eight years after launch, over 3000 refereed scientific publications 3 have made use of SDO data. This success can be attributed to the reliability of the spacecraft and its instruments, the consistency and quality of the observations, the mission's open data policy, and the ease of online data access from the affiliate science data centers. The large volume of structured, calibrated scientific data (over 12 Petabytes and counting) is poised for exploratory analysis from machine learning methods as well as more traditional approaches. In early pioneering works, supervised learning techniques have been applied to the prediction of solar flares using HMI vector magnetograms (e.g., Bobra & Couvidat 2015), as well as HMI and AIA imagery in Jonas et al. (2018). Deep learning applications have began to emerge from the heliophysics community as well, with exemplary cases illustrated in Colak & Qahwaji (2013) and Huang et al. (2018), and Wright et al. (2019) presenting a more recent treatment using the dataset presented here.
While level 1 data are easily accessible, pre-processing these data for scientific analysis often requires specialized heliophysics knowledge. The necessity for such preprocessing may act as an unnecessary hurdle for nonheliophysics machine learning researchers whom may wish to experiment with datasets from the physical sciences, but are unaware of domain-specific nuances (e.g., that images must be spatially and temporally adjusted).
The first contribution of this paper is a curated SDO dataset that is mission-ready for machine learning applications. Our aim is to supply this standardized dataset for heliophysicists who wish to use machine learning in their own research, as well as machine learning researchers who wish to develop models specialized for the physical sciences. In Section 2, we examine current available data products, the pitfalls for their direct use in machine learning tasks, as well as what corrections and adjustments they warrant. These corrections are incorporated into our data preparation procedures that are discussed in Section 3.
The second contribution of this paper are protocols, metrics, and baseline models. We introduce evaluation protocols and metrics in Section 4, and baseline models in Section 5 where we tackle the tasks of predicting irradiance using present and future EVE data, as well as translating 3 HMI channels into 9 AIA channels. We believe these models contain generic enough components with providing useful benchmarks, and highlighting the most dangerous pitfalls, for most subsequent SDO machine learning applications.

EXAMINATION OF RAW DATA PRODUCTS
We first examine existing raw data products available from SDO for each of the three instruments (level 1 science data products for AIA; hmi.B 720s for HMI; and the EVE version 5 IDL saveset).
While heliophysics researchers are likely aware of corrections that must be applied to this data, and the fact that AIA measurements have heterogenous exposure times, it is unrealistic to expect the same from researchers in other fields (e.g. the data set of Kucuk et al. 2017, were compiled from quicklook JPEG2000 images that have compressed dynamic range and do not account for instrumental degradation). We therefore process these corrections by identifying and removing corrupt observations (e.g., images taken during instrument anomalies), adjust detected intensities for heterogeneous exposure times, and fix instrument artifacts that introduce spurious trends.
If such corrupt observations or various sources of heterogeneity are not removed, any subsequent machine learning model will likely learn to emulate these incor- rect observations as well as any spurious trends, and will not be able to isolate the physical dynamics. Exposing such corrupt data during model training may also compromise predictive quality; or worse, the model may even learn to emulate non-physical aberrations and instrumental noise. See Figure 2 for an example of one such unwanted AIA observation.
To identify each instrument's possible issues, we visualize each instrument's data by taking the average channel values (i.e., AIA wavelength band data counts, HMI vector field components, and EVE irradiance values) and plot them over time. We then identify the underlying causes of nonphysical aberrations and what necessary corrections are needed to standardize the data. Below we report our analysis for AIA and outline where HMI required similar adjustments; EVE level-3 data products already address all instrumental issues so we only adjusted for time synchronicity. We describe these corrections in Section 3.
The average channel values for the AIA level 1 data products, as plotted in Figure 1, shows the data heterogeneity as well as the presence of corrupt observations. These are visible in this figure as isolated downward spikes, while the secular downward trend is indicative of degradation over the lifetime of the instrument.
Corrupt observations arise due to a variety of reasons such as data reported during calibration maneuvers, eclipse periods, or the occasional instrument anomaly. Such data, flagged by a non-zero value of the QUALITY keyword for both the AIA and HMI instruments, are not intended for scientific analysis and are removed from our dataset. One of the main sources of heterogeneity in AIA data responds to its instrument design: AIA instrumentation is not designed to directly measure irradiance, but rather data numbers (DNs) as tabulated by the activation of the CCD instrument. While it is intended that DN values are proportional to the flux of photons at a specific wavelength (Boerner et al. 2012), the factor of proportionality is not constant in time. For instance, the camera exposure time t exp is not constant due to instrumental automatic exposure control (AEC); e.g.in times of flares when certain regions on the Sun become especially bright, AEC reduces the nominal exposure time from a few seconds to tens of milliseconds. Due to these factors, when the AEC is activated, the mean registered DNs are drastically reduced; an effect easily compensated for by adjusting for the exposure time.
The visible downward trend in Figure 1 is caused by the gradual in-orbit degradation of the AIA instrument. This degradation is purely due to CCD corrosion over time. Because AIA is calibrated against EVE, which is itself bootstrapped to a program of regular EUV spectral irradiances as measured from sounding rocket launches (Boerner et al. 2014), the time-dependent profile of the AIA instrument is well understood independently of the solar-cycle. This instrumental understanding allows us to correct for AIA instrument degradation by simply applying the aia get response routine in the SolarSoft software package (Freeland & Handy 1998).
Lastly, there is a more subtle non-monotonic heterogeneity caused by SDO's orbit around the Sun. SDO is in a geosynchronous orbit around the Earth, which itself is in a slightly elliptical orbit around the solar system barycenter. The elliptical orbit causes the size of the Sun (in DNs registered on the CCD) to vary by about 10% over the course of a year. This is not an intrinsic feature of solar evolution. We compensate for this effect by resizing AIA and HMI images such that the size of the solar disk is fixed to some size R s . In particular, we can scale the AIA and HMI images by a factor of R s /R obs , where R obs can be obtained from RSUN_OBS keyword in the level 1 FITS header.

PROCESSED DATA PREPARATION
We now describe how our processed dataset is produced in detail. First, we describe how the needed corrections outlined in Section 2 are applied to each instrument, and how temporal synchronization is computed. We first begin by removing the non-zero QUALITY observations from both AIA and HMI. We then spatially downsample to produce a more manageable dataset, while being careful to emulate what a lower resolution instruments would observe.

AIA
We begin with the 4096 × 4096 pixel level 1 , dark subtracted, flat fielded and despiked) data products and process them as described below: 1. The raw images are rotated and resized onto a common grid (still 4096 × 4096 pixels) such that the pixel size is 0.600 arcsec, andx andŷ (the first and second image dimensions) are aligned with the solar west and north directions, respectively.
2. Images are re-binned by averaging neighboring 4× 4 pixel blocks such that the resultant image has size 1024 × 1024 pixels (with a final pixel size of 2.400 arcsec). Resultant images are processed at a 2 min cadence producing the so-called Synoptic series 4 .
3. The AIA images are then normalized by exposure time and corrected for instrument degradation, while corrections for elliptical orbital variation are applied with a fixed disk size of R s of 976 arcsec.
4. Finally, the images are downsampled again by summing in local blocks, which emulates the expected observation of a lower resolution instrument. The final interpolated images have 512×512 pixels with pixel size of ∼4.8 arcsec.

HMI
We start with the original HMI JSOC data series him.B 720s, which provides the magnetic vector field strength, inclination angle, and azimuth (Hoeksema et al. 2014). We process this to calculate full-disk vector field observations in B x , B y , and B z components with 12 minute cadence. The +x direction points to the solar west, +y to the north, and +z out of the image plane (i.e., line-of-sight). Additionally, as with the AIA instrument, although the original image size is 4096×4096, the pixel resolution is different. We therefore further coaligned HMI and AIA data so that they have the same spatial sampling. The major processing steps for the HMI observations are as follows: 1. We begin by converting the original HMI JSOC data series hmi.B 720s vector field data with the disambiguation solution of disambig.fits to B x , B y , and B z components, spatially co-aligning with AIA observations using the FITS header information.
2. The HMI images were also corrected for orbital variation with a fixed disk size R s of 976 arcseconds throughout.
4 http://jsoc.stanford.edu/data/aia/synoptic/ 3. Finally, we downsampled the data by averaging in local blocks, which emulates the expected observation at the target lower resolution. The final interpolated images have 512 × 512 pixels with pixel size of ∼4.8 arcsec.

EVE
EVE spectra are assembled form a battery of instruments including the Multiple EUV Grating Spectrographs (MEGS-A, B, P), Solar Aspect Monitor (SAM), and the Extreme ultraviolet SpectroPhotometer (ESP). Each of these instruments covers a different wavelength range in the EUV spectra and are cross calibrated to produce EVE's data products.
The EVE data released in this dataset is extracted from a specially prepared EVE version 5 IDL saveset, including 39 emission lines (see Table 1) during the time window between 2010-05-01 and 2014-05-26. The end date of this dataset corresponds to the failure of the MEGS-A instrument, which covered the range between 30Å and 370Å. The EVE data have already been calibrated with physical units of W m −2 , scaled to 1 AU, and corrected for degradation, requiring no subsequent calibration. The only processing we perform is to convert from IDL to NumPy Arrays and temporally synchronize with the AIA and HMI observations.

Temporal Downsampling and Synchronization
One of the goals of this paper is to produce a dataset that is temporally and spatially synchronized for the three SDO data products at manageable resolutions. While our scaling to a fixed solar disk size automatically ensures the spatial synchronization of AIA and HMI, all SDO data instruments observe at different cadences (AIA: 2 minutes, HMI: 12 minutes, EVE: 10 seconds) and are not necessarily aligned in time.
In order to perform the temporal synchronization, we downsample AIA to a 6 minute cadence and match the nearest EVE observation within a mean/max time win-dow of 8.5s/12s. This yields a final dataset consisting of AIA and EVE observations each at 6 minute cadence, with accompanying HMI observations occurring at every other time step 5 with a 12 minute cadence.

Data
This produces the final dataset for the paper totalling ∼6.5 TB, made available through the Stanford Digital Repository 6 (please see appendix for a list of URLs). The data are individually packed monthly and for each waveband/component of AIA/HMI, all in the NumPy format. The EVE data are packed in a single TAR file. We show the average value as a function of time for the three products in Figure 3, which demonstrates the removal of spurious trends and artifacts. We also show in Figure 4 an example of co-aligned AIA and HMI observations. The upper panel shows the observation near the solar maximum of cycle 24 (2014 February 25 00:00 UT), exhibiting several active regions with strong magnetic field magnitudes and associated EUV emission. The bottom panel shows the observation near the solar minimum of cycle 24 (2018 August 10 00:00 UT), displaying only one active region with a comparatively weak magnetic field and EUV brightness.

PROTOCOLS AND METRICS
We expect that this data will be of interest for machine learning applications in heliophysics and a simple-access dataset for the testing of machine learning models in the physical sciences. To facilitate this, we have defined standard protocols and metrics to aid future work with this data.
Data splits: There is large temporal coherence in the data since large-scale structures on the Sun evolve at timescales beyond days and months. This leads to issues with randomly sampled splits of the data, often done in machine learning settings with uncorrelated data. In particular, randomly sampled splits will lead to training and testing observations that are separated by days or even minutes. While these observations are indeed distinct points in time, they are generated by virtually the same large-scale structures.
In practice, this means that experiments on randomly split data will be unable to identify overfitting and likely lead to overly optimistic estimates of generalization performance. The specific issue is that when deploying a model, one tests it on large-scale structures and conditions that are different than the training data. However, if the data is split randomly, the model is never actually evaluated on unseen large-scale structures due to temporal coherence. Therefore, there is no indication of whether the model's performance is due to generalizing well or if it is simply explained by the model overfitting to the particularities of the limited large-scale structures observed at training time.
To preclude this, we have split our data in temporal blocks that break this correlation, consisting of (i) a training set used to fit model parameters (e.g., the filter weights of a convolutional neural network); (ii) a validation set used to set model hyperparameters (e.g., the learning rate for training a network); and (iii) a test set used to evaluate out-of-sample model performance.
All of our data splits are performed over the years (2011-2014), the time period for which all three SDO instruments (AIA, HMI, EVE) were active. This time period provides a dataset large enough to support the training of modern models that require copious amounts of data. We set aside years 2012 and 2013 for testing purposes, supplying a wide variety of solar conditions. Years 2011 and 2014 are split into training and validation such that 70% of available EVE observations are used for training (until mid-December 2011) and 30% are used for validation. Of course, other splits are possible, especially for problems not relying on EVE observations. We therefore encourage the community to experiment with various data splits, with the cautionary advice that splits should be constructed in temporal blocks as opposed to random sampling.
Metrics: All of the metrics reported in Section 5 are derived from the normalized absolute error, or |y i −ŷ i |/y i whereŷ is the model prediction and y the measurement for data point i. For scalar quantities like EVE prediction, that are intrinsically already averaged over the Sun, we report the average normalized absolute error over all samples in the test set.
For images (e.g., AIA prediction) that are not already spatially integrated, we report a number of metrics. First, we report the average normalized absolute error averaging first over each predicted image's valid pixels, and then over the images. In computer vision research, this average has been noted to often poorly characterize the performance on most pixels Scharstein & Szeliski (2002) since it can be arbitrarily changed by a small number of large errors. Thus, we also report the percentage of good pixels metric, or the fraction of image pixels with normalized absolute error less than a fixed percentage t for t = 10%, 20%, 50%.   Table 1). The forecast errors of all intervals, N hours apart, contained in the years 2012 and 2013 are averaged to produce the average error plotted for an N hour look-forward time in these figures. The average model surprisingly predicts 7 of the 14 EVE lines within 10% error not showing much overall variation, while the persistence model achieves this same performance for 10 lines. The ridge regression model often outperforms the persistence model overall, but not in all conditions and not by a substantial margin. The linear and persistence models both show periodic trends consistent with one solar rotation. See Section 5.1 for discussion.

RESULTS
In this section, we provide baseline metrics for simple machine learning applications utilizing the proposed dataset, all implemented in the PyTorch library Paszke et al. (2017). These examples were chosen to illustrate what performance metrics should be expected from future models, as well as supplying simple examples for typical use cases. To this end, we have selected and evaluated two problems that demonstrate the temporal nature of the data as well as the alignment between the two spatially-resolved sensors: (i) Predicting future EVE from present EVE, and (ii) Transforming HMI observations to AIA observations. These generic models may be re-applied to a wide variety of other problems not discussed in this section, for instance predicting future AIA from current AIA, or predicting EVE from AIA.
We stress that our baselines are not intended to be the top-performing solutions, but rather a rubric that shows how well a simple data-driven approach would perform. This serves two functions: First, future model implementations that are more complex should out-perform these baselines in the metrics we propose or other such metrics (e.g., focusing only on flaring events); and secondly, the baselines provide context necessary to properly evaluate a future model's performance. For instance, while a more complex model may achieve a low error rate such as 5%, if our baseline already achieved a similar score, the complexity of the new model may not be warranted.

EVE-to-EVE Prediction
The goal of this task is to predict future EVE observations given current EVE observations at a future time ranging from a few hours up to a full solar rotation. In order to provide statistically sound benchmarks in light of strong solar variability, we calculate the average relative error over all predictions for look-forward times of various fixed sizes. This statistical approach informs to what extent we can predict overall future phenomena for a given look-forward time, as well as account for strong solar variability.
There are two main sources of solar variability for this timescale. In shorter time-scales, the main source of variability are flares, which increase the EUV radiative output of the Sun by several orders of magnitude in time-scales of minutes and hours. The second is solar rotation itself (27 days at the synodic Carrington rotation rate). Rotation modulates EUV irradiance because active regions (bright in the EUV) have lifespans of 14 to 55 days and can come in and out of view as the Sun rotates. This active region permanence induces strong temporal correlations at look-forward times greater than 27 days as illustrated by the periodicity in Figure 5, as the Sun's "same face" rotates into view. For model evaluation, we choose a total look-forward time of 29 days, a duration long enough to expose the irradiance periodicity.
For our input data we use the MEGS-A lines listed in Table 1 with the exception of Fe XVI 361Å which is the most sparsely measured line with only ∼1% of the average number of line measurements.
Baselines: For this problem we report 3 simple baselines: 1. Persistence. This model assumes that all future observations of the Sun will be identical to its current state. Thus, for any time jump, it predicts that the future EVE observation will be the same as the current EVE observation.

2.
Constant. This model assumes that the Sun produces a constant EUV irradiance and therefore gives a constant prediction irrespective of the current EVE observation. We set this constant to the training set average per line.
3. Linear. This model assumes that future observations are a linear transformation of the current observations plus a constant bias. We fit this model using ridge regression, or a linear model with Tikhonov/L2 regularization. In particular, for a given spectral line and look-forward time, if x i is the current measurement and y i the corresponding future average observation, we solve for w such that λ||w|| 2 2 + n i ||w T x i − y i || is minimized for all instances i. We set λ per model by grid search to minimize the average normalized absolute error, doing 2-fold cross validation on the training set.

Results:
We evaluate the average normalized absolute error for these models for look-forward times ranging from 2 hours to 29 days in steps of 2 hours and report our results in Figure 5. The linear and persistence both show trends corresponding to the solar rotation: their errors peak at approximately half a solar rotation and reduce steadily until a full rotation occurs, thus confirming the strong correlation between observations separated by exact rotations. The average model's error on the other hand is effectively constant; small variations occur because pairs of 1 day jump observations exist from January 1, 2012 up to December 30, 2013; while pairs of 29 day jumps can only be tested up to December 2, 2013.
Collectively, the results underscore the importance of having good baselines via the surprising effectiveness of even trivial models such as the persistence or average models. For instance, although the average model entirely ignores the current EVE observation, it is able to predict 7 of the 14 EVE lines with less than 10% average normalized absolute error; and similarly, at a lookforward time of 27 days, 10 lines can still be predicted within 10% error by the persistence model.
It is true that the linear model frequently improves on the persistence model, especially for high-error lines like Fe XVI and Fe XV, and look-forward times much less than a full rotation. However, for many lookforward times and lines, the trivial persistence model actually outperforms the relatively complex linear model, demonstrating how simple baselines may assist in properly assessing the effectiveness of a machine learning model.

HMI-to-AIA Prediction
We now move on to an example which demonstrates how a convolutional deep learning model may exploit the spatial richness of our dataset. In this application we show how a mapping between the HMI and AIA instruments is learned by treating it as an image-to-image translation problem. Such an approach is common in computer vision research, with applications as diverse as labeling each pixel in the scene with a category label (e.g., building) Shelhamer et al. (2016), generating images from sketches Isola et al. (2017), inferring 3D properties of scenes Wang et al. (2015), or detecting the pose of humans Cao et al. (2017).
We have physical reason to expect that there exists a mapping between the HMI and AIA instruments. While the HMI instrument infers information about the solar magnetic field from the solar photosphere, the AIA instrument measures UV/EUV emission from the solar chromosphere and corona. Since the chromosphere and corona are spatially structured by the presence of strong magnetic fields, UV/EUV emission will typically reflect information about the magnetic field through its spatial distribution, and vice versa. Here we show how a simple convolutional model can realize the mapping from HMI to AIA.
Baseline: Our baseline is a deep convolutional neural network. This is a function composed of alternating convolutions and non-linearities that maps one multichannel image (e.g., a 3-channel 256x256 image) to another (e.g., a 9-channel 256x256 image). This function can be fit to a dataset of inputs (i.e., HMI) and desired outputs (i.e., AIA) via standard optimization procedures. Throughout we work with 256x256 images.
We adopt a basic approach for our network consisting of a three parts: (i) an initial feature extraction following ResNet He et al. (2016) consisting of a 7x7 convolution with stride 2 followed by 3x3 max-pooling with stride 2, which expands the receptive fields of subsequent feature maps; (ii) a variable number of 3x3 convolutions with stride 1; (iii) 3x3 convolution yielding a 9-channel prediction followed by 4x bilinear upsampling. All intermediate convolutions have 128 filters and are followed by a Rectified Linear Unit Nair & Hinton (2010) and Batch normalization Ioffe & Szegedy (2015). By varying the number of intermediate convolutions blocks in part ii, we can control both parameter count and effective receptive field of the network. We report results with 3, 7, and 11 layers (i.e., with 2, 6, and 10 hidden layers, including the initial convolution in part i).
We train the parameters of the network (e.g., filter weights and biases) via backpropagation and minibatch stochastic gradient descent (SGD) to minimize the mean-squared error of the prediction. Specifically, we use SGD with Nesterov momentum Sutskever et al. (2013) with momentum 0.99, weight decay 10 −8 , and batch size 32. We start with a learning rate of 10 −3 , which we multiply by 0.1 every 5 epochs, and train for 15 epochs. We checkpoint the network at the end of every epoch and take the network with lowest validation loss. To help learning, we divide inputs and outputs per- Ground Truth Input Figure 6. Results for HMI to AIA translation. The Left Panel shows the HMI inputs, while the Right Panel shows the groundtruth AIA (Top Panel) as well as the predicted AIA from a 3-layer network (Middle Panel) and 11-layer network (Bottom). While the 3-layer network performs well, additional layers (i) reduce artifacts (especially in 131Åand 171Å) presumably due to the depth, and (ii) better resolve coronal predictions (especially in 211Åand 304Å), presumably due to the larger receptive field caused by the additional layers. channel by their average over the training set (i.e., network is trained to predict the AIA 94Å image divided by the empirical mean of AIA 94Å over the training set).
Results: We show sample qualitative results in Figure 6 for 3 and 11 layers. Even with a small number of hidden layers, a simple data-driven approach does a good job of getting the general shape and features of the sun, which suggests that results that get general features can be explained with relatively simple models and that more complex models must provide additional results compared to this. Adding more layers helps reduce artifacts at the edge of the disk in the 131Å and 171Å channels and more accurately resolve the corona in the 211Å and 304Å channels. The shallower network has difficulty accurately resolving the corona because each prediction is made from a small portion of the Sun; thus it produces a halo-like effect around the entire sun, as opposed to at specific locations on the disk. Quantitatively, increasing depth produces strong improvements, as seen in Table 2. With a relatively unsophisticated deep network, 75% of the pixels of AIA across all channels can be predicted within 50% relative error from HMI observations. As seen by the percentage good pixels metrics, 1600Å and 1700Å observations appear to be among the easier to predict, and are almost always a few percentage points higher across both network depths and good pixel thresholds. This serves as a good sanity check on the results, since the photospheric and chromospheric brightness features in these two channels are known to be highly correlated to the photospheric distribution of magnetic fields.
6. CONCLUSION In this paper we present a curated, high quality dataset from all three SDO instruments primed for machine learning research. We have preprocessed this data by downsampling AIA and HMI images from 4096×4096 to 512 x 512 pixels; removed QUALITY = 0 observations, corrected for instrumental degradation over time, and applied exposure corrections that account for Earth's elliptical orbit as well as AIA's automatic exposure control. We also have ensured that both AIA and HMI data are spatially co-located, have identical angular resolutions, and that all instruments are chrono-synchronous.
We also highlight some of the potential pitfalls of blindly applying machine learning techniques to solar data, or even more broadly: 1. To maximize its versatility, SDO data products are nuanced and assume an expert-level understanding of its instrumental design and limitations. Using them without this knowledge may lead to incorrect results and invalid conclusions.
2. Most of the physical processes that drive solar variability occur at a much slower cadence than that of SDO's instruments (hours and days vs. minutes and seconds, respectively), requiring special care with the splitting of training, validation, and test sets. Splits must be performed along temporal blocks and not by random sampling, as is done in other settings with uncorrelated data samples. Random sampling in this case will lead to an overly optimistic estimate of validation error, leading to an inability to identify whether a model will generalize properly to future observations or has instead overfit to its data.
3. Due to the relatively short timescales of solar variability, the simple forecasting models of permanence and climatological averages perform exceptionally well in hourly and daily timescales. Due to this, error estimates of more advanced models are not meaningful in an absolute sense, but rather only when compared to these simple baselines.
Finally, we provide a series of baselines that take advantage of this dataset to produce EVE time-forecasts and HMI→AIA reconstructions. These examples are meant to illustrate some of the applications made possible by combining these data with ML techniques, as well as what heuristic performance measures one should expect to compare their own model implementations with.
As with many fields, Heliophysics has entered a datarich age in which the human intellect alone is incapable of processing the copious amounts of data gathered by NASA's ever-growing spacecraft fleet. Fortunately, the ongoing revolution in machine learning research will power a new age of data inference and physical insight that maximizes the scientific output of these data-rich missions. It is important however for heliophysicists and computer scientists to work together to understand the properties and limitations of both the raw data and the ML techniques. If special care is not taken in understanding such limitations, we may unfortunately see a large amount of incorrect, overly optimistic, or worse, misleading research. Interdisciplinary programs such as NASA's Frontier Development Laboratory can provide a vital common ground to facilitate this skill transfer and will be highly critical for the successful and fruitful development of ML techniques in the astrophysical sciences.