Data‐Driven Basis Functions for SuperDARN Ionospheric Plasma Flow Characterization and Prediction

The archive of plasma velocity measurements from the Super Dual Auroral Radar Network (SuperDARN) provides a rich data set for the investigation of magnetosphere‐ionosphere‐thermosphere coupling. However, systematic gaps in this archive exist in space, time, and radar look‐direction. These gaps are generally infilled using climatological averages, spatially smoothed models, or a priori relationships determined from solar wind drivers. We describe a new technique for infilling the data gaps in the SuperDARN archive which requires no external information and is based solely on the SuperDARN measurements. We also avoid the use of climatological averaging or spatial smoothing when computing the infill. In this regard, our approach captures the true variability in the SuperDARN measurements. Our technique is based on data‐interpolating Empirical Orthogonal Function analysis. This method discovers from the SuperDARN data a series of dynamical modes of plasma velocity variation. We compute the modes of a sample month of northern hemisphere winter data, and investigate these in terms of solar wind driving. We find that the By component of the Interplanetary Magnetic Field (IMF) dominates the variability of the plasma velocity. The IMF Bz component is the dominant driver for the background mean field, and a series of non‐leading modes, which describe the two‐cell convection variability, and the substorm. We recommend our new technique for reanalysis investigations of polar‐scale plasma drift phenomena, particularly those with rapid temporal fluctuations and an indirect relationship to the solar wind.

The parameterized approach is now ubiquitous in comparison to the component approach, and is driving progress in both characterization and forecasting, yet it has a shortcoming in that a relationship to the solar wind must be assumed. This has led to the potential for introducing biases. For example, climatological models which are developed assuming a relationship to the Interplanetary Magnetic Field (IMF) (e.g., Pettigrew et al., 2010;Ruohoniemi & Greenwald, 1996, 2005Thomas & Shepherd, 2018) are used for both visual interpretation of quasi-stationary patterns of plasma velocity, and to populate the background field when the variability of the plasma velocity is unknown. Hence, the assumed relationship with the solar wind affects both the characterization and infill of the data. In comparison, the component approach need not make such assumptions when infilling the data.
We are now at a point whereby we can expand and automate the component approach, transitioning our understanding of ionospheric electrodynamics from a consensus of many studies, which each use few data, to new studies, which each use lots of data. Shore et al. (2017Shore et al. ( , 2018 have demonstrated this approach for magnetic field variations, using the Empirical Orthogonal Function (EOF) method. The EOF technique decomposes a large, highly variable data set into a series of independent spatio-temporal subgroups, called "modes" of variance. The modes are discovered from the data with no prior specification, thus they express the intrinsic dynamical properties of the system. Collectively, the modes describe the full variability of the data, and the majority of the variability is commonly given by a handful of mode patterns. These can be used to impute the missing data. This allows us to offer a technique to provide an infill solution without resort to external information (e.g., the solar wind). If desired, the dependence on the solar wind or substorms can later be determined separately for each mode . This allows us to provide a semi-automated characterization of the electrodynamics in a manner similar to the historic component approach, but based on many more data. The goal of this study is to describe a variation of the EOF technique tailored for SuperDARN plasma velocity data, intended for self-consistent investigation of dynamic features in the ionospheric plasma resulting from both directly and indirectly driven sources.
In Section 2, we summarize the data used in the study, and in Section 3 we describe the application of the EOF method and the pre-and post-analysis processing. In Section 4, we describe our findings, which are placed in a wider context in Section 5. We summarize in Section 6.

Data
We use measurements from all SuperDARN northern hemisphere radars which contributed data in February 2001, chosen because it is one of the best months for SuperDARN data coverage in the archive (we are investigating the implications of poorer data coverage as part of a subsequent study). The SuperDARN radars SHORE ET AL. 10.1029/2021JA029272 2 of 18 transmit and receive high frequency radio signals (8-20 MHz) on oblique propagation paths that are refracted toward the horizontal in response to increases in electron density in the E-and F-regions of the ionosphere. When propagating perpendicular to the Earth's magnetic field, these signals can backscatter from magnetic field-aligned, decameter-scale ionospheric density irregularities that move with the background ionospheric plasma flow. This backscatter is measured along (approximately) discrete radar beams that are separated in horizontal azimuth, and sampled along each beam, at a series of distances (ranges) from the radar antenna array-the resulting measurement regions are illustrated in Figure 1.
The radars transmit a multi-pulse sequence to allow the determination of multi-lag complex autocorrelation functions for each sample Ponomarenko & Waters, 2006). The autocorrelation functions are converted to line-of-sight Doppler velocities by applying linear fits to the autocorrelation function phase variations  using version 4.0 of the radar software toolkit (RST 4.0 and FITACF 2.5; SuperDARN Data Analysis Working Group et al., 2017). The resulting velocities are assigned an estimated geographic location (for each combination of beam azimuth and range) using a time-invariant "virtual height" model for the radio wave propagation through the ionosphere, described by Chisham et al. (2008). The uncertainties in the estimated location are summarized by Burrell et al. (2016). The data from a given radar now comprise a series of 3-or 7-s integrations of line-of-sight plasma velocity at a range of geographic locations, with a 1-2 min interval between repeat measurements at the same location (A variety of scanning modes are used by SuperDARN, but the aforementioned scan type is typical).
We apply data selection to these geo-located velocities in order to select good quality data from the F-region ionosphere. To ensure a high signal-to-noise ratio, we discard measurements with backscatter power less than 3 dB, data quality flag set to "true" (indicating low quality by RST 4.0), or which lie within 495 km of the radar array (corresponding to the first 11 discrete ranges) in which unwanted meteor echoes and E-region backscatter may be present. To reduce instances of near-zero velocity from the radar beam scattering off the Earth's surface instead of ionospheric irregularities, we discard measurements with an absolute velocity <50 ms −1 , or marked as ground scatter by RST 4.0. To avoid areas of low data coverage, which can affect the EOF solution, we remove all data from geomagnetic colatitudes greater than 30°.

Pre-EOF-Analysis Processing
The SuperDARN data are well suited for EOF analysis, because the spatial and temporal behavior of the plasma velocity are strongly correlated, in a manner related to the underlying physics of the plasma flow, which we seek to discover. However, pre-processing of the data is essential in order to correctly obtain these patterns of covariance. When applying the EOF method to geophysical data, it is important that the measurement locations and epochs contribute uniformly to the total variance, and that there is no variance structure within the measurement basis itself. This means that the EOF method will then decompose the variance structure of the underlying field, instead of responding to correlations in the spatiotemporal measurement structure. To achieve this, we apply further processing to the line-of-sight velocity data described in Section 2 in order to create a regular, discrete basis in space, time, and for the velocity vector direction, as follows.
Prior to forming the regular spatial basis for our data, we transform the geo-located velocity measurement locations into an inertial (approximately Sun-synchronous) coordinate frame. This is done to reduce the number of EOF modes required to describe the variations in the plasma velocity, which is dominated by convective patterns that are to first order Sun-synchronous. To maintain consistency with our earlier reanalysis SHORE ET AL.  Nominal fields of view for each northern hemisphere radar, which contributed data during February 2001. Both data selection, and the natural variability of ionospheric backscatter, will act to reduce the effective sampling region for each radar. The acronyms are detailed further in Chisham et al. (2007). of the surface-measured external and induced magnetic field (SEIMF; Shore et al., 2017Shore et al., , 2018, we choose the Quasi-Dipole (QD) system of inertial coordinates for this transformation (Emmert et al., 2010;Laundal & Gjerloev, 2014;Richmond, 1995), and we analyze a calendar month of data to allow multiple opportunities for measurements from a given longitude to contribute to a given local time. This time span is also longer than many magnetosphere-ionosphere time scales of interest such that representative modes may be captured. To provide a regular spatial basis in this coordinate frame, the velocity data locations are separated into equal-area spatial bins (Leopardi, 2007), which are defined in QD latitude and QD magnetic local time (MLT, given by the difference of the apex longitude and the geomagnetic dipole longitude of the subsolar point). Note that although we ultimately transform both the measurement locations and the measured vectors to the QD frame, we only transform the locations of the measurements into an inertial frame, while the velocity vectors remain defined in a frame co-rotating with the Earth, with no adjustment for the motion associated with Earth rotation.
We now describe the basis we use to represent the velocity vector direction. A discrete basis is needed for this quantity because at a given location and time, the relationship between the true plasma velocity vector, and the radar's line-of-sight sample of that vector, is unknown. This missing information can be recovered if we define a basis for the vector direction, using the horizontal angle between the line-of-sight direction for a given radar measurement, and the local QD north direction. This provides a fixed relationship between the measurement line-of-sight direction, and the QD frame. The velocity vector's direction and amplitude are then discoverable from the cumulative information given by many non-contemporaneous measurements in different look directions at the same location, as we describe later in Section 3.3. The non-orthogonal nature of the QD coordinate frame brings in additional complexities to defining this velocity direction basis in a self-consistent manner, as follows.
The horizontal QD basis vectors are termed f 2 and f 1 in the notation of Laundal and Gjerloev (2014), where f 2 points predominantly northward at most locations and f 1 predominantly eastward. In addition to being non-orthogonal, the QD basis vectors are not of unit length-both these factors vary with geographic location, and will affect the direction and magnitude of the line-of-sight plasma velocity measurements. Since we analyze the data in a non-geographic frame, we must rescale both the line-of-sight direction, and amplitude, of the velocity measurement as follows. The radar beam line-of-sight direction is defined by its horizontal clockwise angle (θ beam ) starting from the f 2 basis vector. This angle is normalized by the ratio between orthogonal axes (=90°) and the horizontal clockwise angle γ QD from the f 2 basis vector to the f 1 basis vector: Following this normalization, a value of     0 beam describes a measurement in the direction of f 2 , and     90 beam points along f 1 . We use the rescaled north-offset angles   beam to separate the data within each spatial bin into 6°-wide angular partitions defined clockwise from local QD north. This forms a regular basis for the different line-of-sight measurement directions, with minimal loss of precision.
Following Equations 2 and 3 in Laundal and Gjerloev (2014), we now rescale the magnitude of the velocity measurement v to take account of the non-unit-vector nature of the QD basis vectors: The operation scales the velocity measurement according to the "amount" that it is pointing in each of the f 2 and f 1 directions. The additional scaling factor F is given by where k is a unit vector pointing upward from the Earth's surface.
Lastly, we construct a regular temporal basis for the velocity data. For all the data in a given spatial bin and in a given 6° line-of-sight bin, we compute a time series of the median line-of-sight velocity in successive SHORE ET AL. The above processing is applied to all available data in February 2001. Following this, the velocity data v for each nth 5-min epoch (n ∈ {1, 2, …, N}), pth QD spatial location (p ∈ {1, 2, …, P}), and ith line-of-sight partition (i ∈ {1, 2, …, I}) for a given month are formed into a data matrix  Only the variance of the velocity data is analyzed, so prior to applying EOF, we remove the temporal mean X of the data in each column of  X (i.e., each mth combination of pth spatial location and ith partitioned line-of-sight direction now has zero mean).
Recall that our rationale for creating a regular basis for the velocity data was to ensure that all elements of a given dimension of the data matrix are uniform, in terms of their contribution to the total variance. However, the bases N, P, and I of the data do not conform to this requirement, since regions of missing data are highly systematic in SuperDARN. For example, the data are relatively more numerous in the polar cap and cusp, and relatively sparser where radar beams pass through regions of high radio wave absorption due to strong auroral particle precipitation. To reduce the impact of this systematic bias on the EOF analysis, we weight the data in each nth row of  X by the number of measurements at that epoch (termed z), divided by the number of possible data points M where W is a square matrix with dimension N, and is zero on all off-diagonal elements. The weights range from a value of 1 where a given row of  X is fully filled, to 0 where that row is empty. They act to decrease the relative contribution of poorly sampled epochs to the solution for the variance of the data. We apply the weights via where the exponent of 0.5 is used to account for squaring implicit in the EOF decomposition (described in the next section).
The EOF analysis requires that the input data set have no missing values. However, X is highly sparse-the majority of the regular grid in space, time, and line-of-sight direction has no coverage. To infill the missing coverage we use a modification of the Data-Interpolating EOFs technique (DINEOFs, described by Beckers & Rixen, 2003), which was modified for application to magnetic field data by Shore et al. (2017Shore et al. ( , 2018. In the DINEOF technique, the missing values in the data matrix are initially infilled with zeros, and are then iteratively replaced with forecasts of the variability from successive EOF analyses, until the amplitude of the infill converges with that of the data coverage. For clarity, there are no measurements where we apply the infill, but since the EOF solution is computed over the full N by M extent of X, there are regions where the infill and original data can be compared. In the modified version of DINEOFs applied here (described by Shore et al., 2017Shore et al., , 2018, the infill is a nested two-stage iteration process. For clarity, we provide a schematic illustration of the iterative data infill procedure in Figure 2. In the first stage of the iteration process, the infill values start at zero and are iteratively replaced with EOF predictions of increasing amplitude-we signify the iteration number with the subscript h. When the iteration over h is complete (see below), the infilled model prediction is subtracted from the original data, thus iteratively modifying the values of the original (non-infilled) data-we signify this with the subscript g. This modification is performed because it means that the leading EOF modes obtained at successive iterations of g are no longer orthogonal, which allows for the variance of the underlying system SHORE ET AL.
10.1029/2021JA029272 5 of 18 to be described using fewer iterations of g. To distinguish the original and altered data matrices, we use the subscripts o (for "original") and f (for "filled"), and we use the subscripts g and h to describe the different infill stages. So the sparse data matrix from Equation 4 is referred to as X o,1 , where 1 is the value of g, signifying that these data have not yet had an EOF prediction subtracted from them. We refer to data matrices which have undergone infill as a variant of X f,g,h . As an initial infill choice we define a matrix X f,1,1 , for which the null values of X o,1 are replaced by zeros (i.e., the mean value of the processed data) The processed data are now ready for EOF analysis.

Eigen-Decomposition Method
The following summary of EOF analysis is described in full by Bjornsson andVenegas (1997, page 12), von Storch andZwiers (2002, pages 294-295) and Jolliffe (2002, page 5). Given a mean-removed vector field X f,g,h , measured at N times and P locations in I different directions, EOF decomposes this into N spatial basis vectors of length P * I, and N temporal oscillations of length N. Each pair of a spatial pattern v and its temporal oscillation t is termed a "mode" of the analysis. These modes are standing wave components of the vector field X which together reconstruct the original data: where V are eigenvectors of the covariance matrix , , Each column of V and T describe a different mode of the EOF analysis: which is of size (N by N). The modes are ranked by eigenvalue, so mode 1 has the largest eigenvalue, and it describes the largest proportion of the variance of X.
To iteratively infill the missing data with an EOF model prediction, we use Equation 8 with j = 1 to reconstruct the infilled data matrix using mode 1 only: The process of infilling X f,1,2 as per Equation 9 and performing the EOF decomposition of Equation 8 is repeated until the amplitude of X e,1,h converges with that of X o,1 , where both exist. Similar to Shore et al. (2017), we find that convergence occurs well within h = 35, with diminishing returns for greater h. After convergence, the information represented by the leading mode is complete across all parameters N, P, and I. We do not want this added information to control the infill for subsequent modes, so when incrementing the iteration over g we remove the iterated EOF solution from the original data via SHORE ET AL.
where X o,2 represents the original data minus the information contained in the first iterated mode, and M is a "mask" matrix used to assign null values to a given matrix, such that X o,g = MX f,g,h . We then repeat the iterative infill process, beginning with zeros as before In summary, during this nested iteration over g and h, we iteratively infill empty bins using the amplitudes of the first mode, then we remove this mode from the original data, and repeat the process. The results discussed in this manuscript are the leading modes from each stage of the removal process, once convergence has been attained. We refer to the set of all fully iterated v as Z, and all fully iterated t as U. Although all columns of Z are technically "mode 1" of a given removal stage, we refer to them as modes 1, 2, and so on, according to the stage (i.e., value of g) they originated from.
In the remainder of this report, we describe the EOF model with the weighting removed, via Following this de-weighting, the reconstruction UY T will produce a model of the data matrix with the same units as  X.
For display, we compute a version of the EOF basis vectors in which the spatial patterns have normalized amplitudes, and the temporal eigenvectors have units of ms −1 , as follows where u j is the jth column vector of U, and the scalar Y j is the maximum-amplitude element of y j , itself the jth column vector of Y.

Post-EOF-Analysis Processing
At this point, both the mean field X which we removed prior to EOF analysis, and the individual EOF spatial basis vectors  j y , are defined over I line-of-sight directions at each pth location. We compute the relationship of this multi-direction spatial basis to the cardinal QD north and east vector components (N QD , E QD ) as follows, using the mean field from February 2001 as an example. At a single location in QD latitude and QD MLT, the mean field data give the proportional amplitude of the background plasma velocity in all 6°-wide line-of-sight partitions i. These values are shown for a single location in Figure 3a, as black dots. We fit a sinusoid to these data (illustrated with the red curve in Figure 3a), using an unconstrained nonlinear minimization to give the best-fitting amplitude a and phase θ phase . Given the scaling and normalization applied to the pre-EOF data described in Section 3.1, the QD north and east vector components at the spatial location are then given by SHORE ET AL.
The sinusoid fitting procedure is repeated for all p spatial locations of the mean field. The horizontal vector map of fitted mean velocities (encompassing all locations) is shown in Figure 3b.
The sinusoid fitting procedure is likewise applied to each location in each of the g mode spatial patterns (our model is based on 10 fully iterated modes). We note that the fitting of a smooth (sinusoid) model to the plasma velocity happens only after the EOF-driven infill of the variability in un-measured locations, and does not affect the mode time series (which retain their 5 min cadence). In other words, the EOF modes recover the plasma velocity vector based on an entire month's worth of radar measurements at all locations, without necessitating any temporal smoothing. In this way, the sinusoid fitting provides a natural complement to the decomposition of the data variability into independent standing waves of spatial and temporal information.
Lastly, we note that a given pth location may not include coverage from all I line-of-sight directions. The combinations of p and i which lack all measurements cannot be infilled using DINEOFs, and were not included in the EOF analysis of X. The extent of this missing data varies with data selection choices, and hence this is not reflected in the generalized mathematics of Section 3.2. In extreme cases, the sinusoid fitting is affected by systematic gaps in particular line-of-sight directions, particularly where data coverage is very low. As part of the sinusoid fitting process for each bin, we record the sum of squared differences between the sinusoid model fit and the mode spatial pattern data. If this misfit value exceeds 6 times the standard deviation of the misfits for all bins, we reject the sinusoid fit for that bin-this is the reason for the grayed-out bin near dawn in Figure 3b. The polar bin (where line-of-sight direction is undefined), and bins at colatitudes over 30°, are gray in Figure 3b since they were removed as part of the data selection process. For reference, the sum of squares misfit value of the example shown in Figure 3a is 19,662 (ms −1 ) 2 (with a root mean square misfit of 26 ms −1 ), and the bin in Figure 3b with the poorest permitted fit has a sum of squares misfit of 1,670,533 (ms −1 ) 2 (with a root mean square misfit of 240 ms −1 ). This poorest permitted fit is shown in Figure S1 in the Supporting Information, to illustrate the range of quality in the fits which are permitted.

Results
In Figure Figure 4 to determine which solar wind or magnetospheric driver best explains each of them. For this subjective interpretation, we rely on the expert knowledge garnered from decades of study of the magnetosphere-ionosphere system behavior (refer to Section 1). We also refer to the discovered modes from a series of EOF analyses of magnetic field data spanning solar cycle 23, performed by Shore et al. (2018), which are shown in Figure 5. These vector patterns are shown in the form of the equivalent ionospheric current which would lead to the observed magnetic perturbation. As discussed by Shore, Freeman, Coxon, et al. (2019), there are a number of definitions for this equivalent current. We choose the simplest, which is to rotate the horizontal magnetic vector by 90° clockwise (this transformation does not affect the time variation of these modes). These equivalent currents may not reflect the true current distribution equally well in all seasons and locations (e.g., Laundal et al., 2015;Laundal, Finlay, & Olsen, 2016;Laundal, Gjerloev, et al., 2016;Laundal et al., 2018). However, the transformation allows a simple visual comparison between the equivalent current and the plasma velocity, which are approximately anti-parallel for uniform ionospheric conductance. SHORE ET AL.
We will now contrast the magnetic field modes from Figure 5 with the SuperDARN plasma velocity mean field and modes shown in Figures 3 and 4, to draw some conclusions about the different structures of variance which comprise the plasma drift and the magnetic field, and the underlying physics.
Based on the spatial pattern of mode 1 in Figure 4a, we consider this mode to represent the horizontal motion of ionospheric plasma driven by the IMF B y (dawn-dusk) component, following dayside magnetic reconnection at the magnetopause. For ease of reference, we term this mode the "Velocity Perturbation Type Y" (VPY), to reflect its driving by IMF B y . The VPY feature is spatially very similar to the "Disturbance-Polar Type Y" (DPY; Friis-Christensen et al., 1985;Friis-Christensen & Wilhjelm, 1975;Greenwald et al., 1990) mode shown in Figure 5b. We can contrast these patterns to make inferences about the different behavior of the electric field (which drives the plasma convection) and the magnetic field, as follows.
SHORE ET AL.

10.1029/2021JA029272
10 of 18 It is worth noting that the dominant (VPY) pattern of plasma velocity variability (Figure 4a, which is controlled by IMF B y ) is spatially dissimilar to the monthly mean velocity (Figure 3b, which is controlled by IMF B z ). Conversely, for the magnetic field, the dominant variability pattern is spatially alike to the mean field (e.g., compare Figures 2 and 4a of Shore et al., 2017, both of which show a two-cell convection). This implies that for the magnetic field, the (monthly) mean better represents the value at a given epoch, whilst the plasma drift (and electric field) is less well suited to representation by climatological averages. This underscores the importance of accounting for the variability of SuperDARN data when performing imputation to infill data gaps.
In terms of contributions to the total variability, Shore et al. (2018) found that the DPY pattern was only identifiable in the leading modes of geomagnetic perturbation during local summer, when it was always a secondary contributor compared to the two-cell convection pattern (driven by IMF B z ). In contrast, the VPY feature is resolved here even in local winter, where it dominates the SuperDARN variation, while variance features related to the two-cell convection (detailed below) are secondary. This implies a greater importance for IMF B y in describing the plasma drift (or electric field) variations than in describing the magnetic field variations. The different positions in the variance hierarchy of the VPY and DPY modes are evidence that the IMF B y input acts as a voltage source, whose contribution to the magnetic field variance is modulated by ionospheric conductivity.
Further evidence for our physical interpretation of the plasma velocity variance structure is provided by comparing the modes with independent measures of solar-terrestrial coupling. In Figure 6, we cross-correlate the four modes shown in Figures 4e-4h with the month-long time series of the IMF B y and B z components, as previously done for the magnetic field by . These correlations are performed at a range of lags (from 0 to 500 min), where a positive lag τ indicates that the ionosphere responds τ minutes after a perturbation in the IMF at the bow shock nose. The correlations between the modes and the IMF are expected to peak at lags which reflect the time taken to reconfigure the polar ionospheric electrodynamics, following a change in the IMF.
From the SuperDARN mode correlations with IMF B y shown in Figure 6a, we see that only mode 1 (the VPY pattern, corresponding to the black line) has a strong response to this IMF component. The VPY response to IMF B y driving is strongest at 20 min' lag-similar to that found for the DPY pattern . In other words, we find that both the plasma drift (i.e., the electric field) and the magnetic field respond to direct driving with the same latency, when ionospheric conductance is high SHORE ET AL.
10.1029/2021JA029272 11 of 18 enough to produce a detectable magnetic perturbation. The long lag time taken for VPY and IMF B y to become decorrelated is due to the long autocorrelation of IMF B y with increasing lag (see . In their analysis of the geomagnetic variability, Shore et al. (2018) showed that two modes (shown here in Figures 5a and 5c) each represent a separate aspect of the ionospheric two-cell convection, which is driven when IMF B z is directed southward (Friis-Christensen & Wilhjelm, 1975;Hairston et al., 2005;Nishida, 1968a). The pattern of geomagnetic variation shown in Figure 5a represents the strength of the two-cell convection. Shore et al. (2017); Shore et al. (2018) termed this mode "Disturbance Polar Type 2" (DP2), to reflect its relationship to the associated equivalent current system (e.g., Nishida, 1968a). According to the Expanding-Contracting Polar Cap (ECPC) paradigm Siscoe & Huang, 1985), the open-closed field line boundary (and with it, the two-cell convection system) expands and contracts on a variety of timescales, owing to the balance of dayside and nightside magnetic reconnection (Milan et al., 2017). Here, this expansion and contraction is represented by the geomagnetic variation pattern shown in Figure 5c. This mode describes the spatial rate of change of DP2 (when the two patterns are summed), and was termed "DP2 Expansion-Contraction" (DP2EC) by Shore et al. (2017). The DP2EC pattern is essentially the first derivative in latitude of the DP2 pattern. We find that both the amplitude and the spatial extent of the two-cell convection are represented in the SuperDARN modes, albeit in a slightly different manner, as we outline below.
Mode 2 of the plasma velocity (shown in Figure 4b), and mode 3 (Figure 4c), represent an expanded and contracted form of the two-cell convection pattern respectively. For instance, in Figure 4b the plasma velocity direction in the dawn-dusk meridian changes from east to west at latitudes of 75° (dawn) and 72° (dusk). Conversely, in Figure 4c these reversals occur at 79° (dawn) and 81° (dusk). So in the SuperDARN modes, the two-cell convection system is represented in the form of two 2-cell convection patterns, with mode 2 representing the expanded system, and mode 3 describing the contracted system. We term mode 2 "Velocity Perturbation Type 2 expanded" (VP2e), and we term mode 3 "Velocity Perturbation Type 2 contracted" (VP2c). Like the geomagnetic modes DP2 and DP2EC, the VP2e and VP2c modes (when summed with different relative amplitudes) describe the range of expanded or contracted states of the convection system, and its differing intensities. We provide proof of our interpretation of VP2e and VP2c in Figure 7, where the mean field is added to modes 2 and 3, whilst the modes have had different amplitudes (and signs) assigned. The superposition of these three fixed patterns is sufficient to recover the range of the two-cell convection system's spatial variations: either strong (S) or weak (W), and either expanded (E) or contracted (C) in latitude.
We now compare the response of the different convection modes to IMF B z driving. VP2e and VP2c each represent the strength of the two-cell convection pattern, so we expect them to have a direct relationship to IMF B z . The correlation between VP2e (i.e., mode 2, in Figure 4f) and IMF B z is shown in Figure 6b (blue line). We see that the plasma drift reconfiguration response for this mode peaks with a relatively high correlation coefficient at a lag of 20 min and has a long decorrelation timescale, similar to the driving of VPY by IMF B y .
The correlation between VP2c (i.e., mode 3 in Figure 4g) and IMF B z is shown via the red line in Figure 6b. This correlation peaks at 15 min lag, which is slightly sooner than the response seen for VP2e. The longer response of VP2e is likely because the development of an expanded convection system requires more driving from IMF B z . We also see that VP2c decorrelates (with IMF B z ) faster than VP2e does (i.e., there is zero correlation in the red line in Figure 6b between lags of 60-120 min). A possible explanation is that following sustained IMF B z driving, a substorm will likely occur, and the VP2c mode would then describe the associated contraction of the convection system. From the low correlation between VP2c and IMF B z during lags of 60-120 min, it appears that the nature of the polar cap contraction is not directly related to the IMF B z driving in the preceding 1-2 h.
Comparing the plasma drift and magnetic field responses to IMF B z at long lag allows us to contrast the differing impact of ionospheric conductivity on these fields. In contrast to the rapid responses to IMF B z driving seen for VP2e and VP2c, the response of DP2 to IMF B z reported by Shore, Freeman, and Gjerloev ( response of the geomagnetic perturbation to IMF driving was also identified by Weimer et al. (2010), with two key findings of relevance to the discussion here. First, Weimer et al. (2010) found a consistent difference between dayside and nightside IMF response timescales, with nightside longer on average. This is expected from Dungey cycle theory (Dungey, 1961), and has since been investigated in more detail by Shore, Freeman, Coxon, et al. (2019). Second, Weimer et al. (2010) found higher correlations between the IMF and nightside geomagnetic variations, compared to dayside variations. The cause of this correlation difference was identified by . The authors proposed that because sustained IMF B z driving will likely lead to a substorm event, then on average we expect increased particle precipitation and conductivity in the auroral zone at longer lags. Following a precipitation event, a unit change in the solar wind driver will cause a larger geomagnetic response. This will increase the correlation between DP2 and IMF B z at around 60 min' lag, which is the most common time taken for IMF B z driving to create substorm conditions. Since both VP2e and VP2c have narrow peaks in their correlation with lagged IMF B z , we infer that the plasma drift (i.e., electric field) variation does not have this feedback from ionospheric conductivity.
Lastly, we highlight mode 4 of the plasma velocity variation, shown in Figure 4d. This pattern is dominated by a nightside signal, which contains two opposing-direction vortices, both at 68° latitude, with one centered near 02:30 MLT, and the other centered near 21:30 MLT. These may be the plasma velocity perturbations associated with the incoming and outgoing field-aligned currents which characterize the substorm current wedge (Frey et al., 2004;McPherron et al., 1973). This interpretation is upheld by the correlation of this pattern's time series (from Figure 4h) with lagged IMF B z , which is shown in Figure 6b (green line). The correlation is weakly negative, implying that the mode 4 pattern is driven indirectly by southward IMF. Also, the correlation peaks around 60 min' lag, which is the typical time taken for the IMF to drive a substorm. Further evidence that mode 4 is a substorm mode is given by a superposed epoch analysis and correlation with SMU and SML in the Supporting Information.

Discussion
In this study, we have described a technique to solve the following issues, which are intrinsic to the Super-DARN archive: incomplete sampling of the plasma velocity vector (i.e., only in the line-of-sight direction), incomplete temporal coverage, and incomplete spatial coverage. To achieve this goal, we have developed a version of data interpolating EOFs (DINEOFs) that is tailored to SuperDARN data. DINEOFS were originally proposed by Beckers and Rixen (2003), but here they are based on the slightly altered formulation of Shore et al. (2017). The primary advance is that the EOF analysis imputes the spatial, temporal and velocity-direction information at any given point based solely on the covariances of the data over a month. In other words, the missing information in time, space, and velocity perpendicular to the radar line-of-sight is infilled at a given location based on the behavior of both the co-located line-of-sight plasma velocity measurements from different line-of-sight directions throughout the month, and also the contemporaneous line-of-sight velocity at other locations. This allows us to avoid the use of external information (e.g., the solar wind), climatological averages, or smooth global models (e.g., spherical harmonic analysis) when constructing a complete reconstruction from the available measurements.
Once the spatial and temporal basis of the data has been completed at every location in the gridded data set by EOF analysis, a sinusoid is fitted to each spatiotemporal location to gain the full velocity vector information from the incomplete directional measurement basis given by the radar network. This is required because the radars can only return line-of-sight velocity, and do not measure the full velocity vector. Since the sinusoid fitting extracts for us a cardinal (north, east) vector basis from the n-dimensional mode spatial patterns, we have subsequently interpreted these patterns physically, to infer information about the behavior SHORE ET AL. of the plasma drift (and electric field) in the polar ionosphere. The temporal behavior of the modes is unaffected by the sinusoid fitting process, and has been used to investigate the solar wind driving of the modes.
EOF analysis has been used in multiple prior studies of high-latitude electric field variability (i.e., plasma velocity variability), for instance by Kim et al. (2012), Cousins et al. (2013), and Walach et al. (2021) using SuperDARN data, and by Matsuo et al. (2002) using Dynamics Explorer-2 satellite data. We can compare their approaches and results with ours. In agreement with our finding that IMF B y dominates the plasma velocity variability, Matsuo et al. (2002) found that their primary mode was also best associated with IMF B y . In contrast, Kim et al. (2012), Cousins et al. (2013), and Walach et al. (2021) found that their leading mode was best described by IMF B z . We suggest two possible reasons for this discrepancy. First, the data set used by Cousins et al. (2013) is considered to be representative of geomagnetically quiet conditions with a bias toward summer, whereas we use data from an entire winter month at solar maximum. In this case, different dynamical modes should be expected from EOF analysis of these different datasets. It may be that the lower variability in the Cousins et al. (2013) data set makes the dynamic behavior closer to the mean behavior, which is dominated by IMF B z (refer to Figure 3b). A second possible reason for the discrepancy is that the studies of Kim et al. (2012), Cousins et al. (2013), and Walach et al. (2021) apply smoothing to the SuperDARN data prior to the computation of the EOF modes, either via spatial and temporal regridding to reduce the variability of the data, or via smooth model fits. Kim et al. (2012) additionally used an IMF-dependent model to infill their data gaps. In contrast to these, Matsuo et al. (2002) compute their EOF modes directly on the along-track satellite data, prior to computing the (smoothed) global electric potential from these EOF modes. This matches our approach, in that we seek to allow the EOF analysis maximum access to the full variability of the data, before we apply any smooth models. It may be that this approach better represents the true plasma flow variability. Indeed, we resolve higher peak correlations between our modes and IMF B Lastly, we note that we resolve a mode which describes the substorm, whereas each of the studies mentioned above do not. This further implies that our technique is more sensitive to the electrodynamic variability.
Regarding the use of our EOF reanalysis as a solution for infilling gaps in the SuperDARN archive, again there is a range of possible options for the user to choose between. Our primary motivation in describing our technique is to create an alternative to the "Map Potential" method for obtaining the global plasma flow pattern, introduced by Ruohoniemi and Baker (1998), and further improved by Thomas and Shepherd (2018). Recently, several additional alternative infill solutions have been proposed, for example, by Gjerloev et al. (2018) and Nakano et al. (2020). These techniques each contrast with ours in a similar manner, in that they rely on additional measurement information to complete the SuperDARN data coverage. For Thomas and Shepherd (2018) and Nakano et al. (2020), this information is climatological averages of the flow for different solar wind states, while Gjerloev et al. (2018) relied on a principal component regression of the flow history onto geomagnetic indices. Conversely, our infill solution is entirely self-consistent, and to our knowledge is the only SuperDARN imputation which requires only SuperDARN data. However, we stress that EOF meets a different need for the SuperDARN community to these other models. We suggest that Map Potential (or the models of Gjerloev et al., 2018or Nakano et al., 2020 remains the go-to model for nowcasts, i.e., maps of the instantaneous representation of the electric field. EOF offers no intrinsic nowcast or forecast potential (although the dependence of these modes on solar wind variables could be used to develop a forecast model, which we will investigate in future studies). Hence, we suggest that our model technique be used to provide a reanalysis of SuperDARN, for an accurate reconstruction of past variability.

Conclusions
We present a novel solution for a complete specification of the 2-D plasma velocity field over the entire mid-and high-latitude polar regions from sparse SuperDARN radar data. This is the first SuperDARN infill solution which is entirely self-consistent, in that it relies only on the SuperDARN data, in contrast to the more popular climatological averaging technique of computing the mean velocity field for a given exogenous (e.g., IMF) state. By avoiding climatological averaging, our method also provides superior estimates of the ionospheric electrodynamic variability.
SHORE ET AL.

10.1029/2021JA029272
15 of 18 Our technique decomposes the velocity field variability into mathematically orthogonal modes, analogous to Fourier analysis. By investigating the dependence of these discovered modes on the IMF, we find that the dominant mode of variability of the plasma velocity is driven by the IMF B y component, while IMF B z is the dominant driver of the next three most important modes of variability. Following a perturbation in IMF B y at the bow shock nose, the reconfiguration timescale for the plasma velocity is 20 min. For a perturbation in IMF B z , the reconfiguration occurs between characteristic timescales of 15 and 60 min.
Based on the spatial morphology of the velocity modes, their dependence on the IMF, and their relationships to previously reported surface magnetic field modes; we define and interpret the mean and the leading four velocity modes as follows: 1. Mode 0 (Mean) = Velocity Perturbation Type 2 (VP2). This is the (monthly) average two-cell convection pattern. 2. Mode 1 = Velocity Perturbation Type Y (VPY) that is directly driven primarily by IMF B y with a lag of 20 min from the bow shock. The addition of this mode to the mean two-cell convection mode VP2 creates a dawn-dusk asymmetry and the familiar "banana" and "orange" convection cells in the polar cap. It represents the velocity perturbation caused by dawn-dusk magnetic tension on newly reconnected magnetic field lines at the magnetopause. VPY is analogous to the Disturbance Polar Y mode of surface magnetic field variation. 3. Mode 2 = Velocity Perturbation Type 2 expanded (VP2e) that is directly driven primarily by IMF B z with a lag of 20 min. The addition of this mode to the mean two-cell convection mode VP2 expands and enhances the global convection. It represents the velocity perturbation caused by variable magnetic reconnection at the magnetopause and magnetotail. 4. Mode 3 = Velocity Perturbation Type 2 contracted (VP2c) that is directly driven primarily by IMF B z with a lag of 15 min. The addition of this mode to the mean two-cell convection mode VP2 contracts and strengthens the global convection. The sum of VP2, VP2e, and VP2c is analogous to the Disturbance Polar 2 and Disturbance Polar Expansion-Contraction modes of surface magnetic field variation. 5. Mode 4 = Velocity Perturbation Type 1 (VP1) that is indirectly driven by IMF B z with a lag of 40-80 min.
This mode represents the velocity perturbation caused by the substorm current wedge. VP1 is analogous to the Disturbance Polar 1 mode of surface magnetic field variation.
Our technique appears to capture more of the underlying electrodynamics behavior in fewer dynamic modes than the analyses of previous studies. This is probably because we infill the data before applying any smooth model and avoid specifying prior relationships to the solar wind. We envisage that our technique will be used either as a standalone data set for investigating polar-scale phenomena, or as an assumptions-free data set for infilling gaps in the (higher-resolution) SuperDARN measurement set.

Data Availability Statement
The SuperDARN EOF reanalysis model produced in this study is available at https://doi.org/10.5285/ f4245a21-dee9-46cf-85b2-114798cb7ebc. OMNI IMF data were downloaded from ftp://spdf.gsfc.nasa.gov/ pub/data/omni/high_res_omni/monthly_1min/ on 2014-03-06. The geomagnetic EOF model based on Su-perMAG data is available in the supporting information of Shore et al. (2018). The SuperDARN data are available from the BAS SuperDARN data mirror (https://www.bas.ac.uk/project/superdarn). To gain access to the SuperDARN data, email bassuperdarndata@bas.ac.uk to set up a user account for the BAS SuperD-ARN data mirror.