WawHelioIonMP: A Semiempirical Tool for the Determination of Latitudinal Variation in the Ionization Rate of Interstellar Hydrogen and the Solar Wind

The latitudinal structure of the solar wind varies during the cycle of solar activity. Analysis of this variation is important for understanding the solar activity and interpretation of observations of heliospheric energetic neutral hydrogen atoms and interstellar neutral (ISN) atoms inside the heliosphere, which yield information on the heliosphere and its interaction with the interstellar medium. Existing methods of retrieving this information from indirect remote-sensing measurements of phenomena, including the heliospheric backscatter glow and interplanetary scintillations of remote radio sources, are challenging to apply in real time. Here, we propose a method WawHelioIonMP of approximate retrieval of latitudinal profiles of the ionization rates of ISN H using a machine-learning-based interpretation of the helioglow. Assuming that we know their history during two past solar cycles and have observations of the helioglow for close-to-circumsolar circles with a radius close to 90°, we derive statistically an algebraic relation between the ionization profiles and lightcurves. With the relation reversed, we are then able to derive the ionization rate profiles based on observed light curves, such as those planned for the GLObal solar Wind Structure (GLOWS) experiment on the forthcoming NASA mission Interstellar Mapping and Acceleration Probe (IMAP). The application of this method is straightforward and rapid because complex simulations are no longer needed. We present the method of retrieval of the profiles of the ionization rates, leaving the discussion of details of the decomposition of the retrieved ionization rate profiles into profiles of the solar wind speed and density to a future paper.


Introduction
The latitudinal structure of the solar wind varies during the cycle of solar activity.Accounting for this variation is important for understanding the solar activity, but also for the interpretation of observations of heliospheric energetic neutral hydrogen atoms (H ENAs) and interstellar neutral (ISN) atoms inside the heliosphere, which yield information on the heliosphere and its interaction with the interstellar medium.
The interaction between the charged and neutral populations in the heliosphere occurs mostly due to charge-exchange reactions.Also important is the photoionization of neutral atoms by solar EUV photons (Ruciński et al. 1996;Bzowski et al. 2013).These ionization processes are responsible for losses of heliospheric ENAs and ISN atoms between their creation loci and detectors at 1 au.They shape the distribution of the density of ISN gas within several astronomical units from the Sun and are responsible for the production of pickup ions in the supersonic solar wind.These latter ions, after transmission through the solar wind termination shock, become the principal source of thermal pressure in the inner heliosheath (e.g., Rankin et al. 2019).Hence, monitoring of the solar wind structure and in general of the ionization processes is indispensable for the analysis of heliospheric observations.
Ground truth has been obtained from the Ulysses measurements.However, they are point measurements, with repetition of observations at individual heliolatitudes available several years apart due to the orbital motion of the spacecraft.Furthermore, these observations are only available for the interval 1992-2007.Scintillation and helioglow observations are performed remotely from the ecliptic plane.A continuous series of helioglow measurements are available from the Solar and Heliospheric Observatory (SOHO)/SWAN experiment (Bertaux et al. 1995), for the time from 1995 until the present.The interval of scintillation observations is longer and covers the time from 1985 until present (Tokumaru et al. 2021).
On the one hand, the remote character of the helioglow and interplanetary scintillation (IPS) observations enables grasping the entire range of heliolatitudes at once.On the other hand, integration over lines of sight results in a poor coverage of the polar regions (see Figure 7 in Sokół et al. 2013, which illustrates this point for both the scintillation and helioglow observations), and the contributions of various heliolatitudes to the signal can only be obtained from modeling.
The scintillation and helioglow observations yield different solar wind-related quantities.Scintillation observations provide latitudinal profiles of the solar wind speed.The density is subsequently obtained based on in situ observations of solar wind parameters in the ecliptic plane and the application of the latitudinal invariant of the total energy flux of the solar wind pointed out by Le Chat et al. (2012).With this, it is possible to calculate latitudinal profiles of charge-exchange ionization rate for H atoms nearly stationary relative to the Sun.Analysis of the helioglow observations yields profiles of the total ionization rate of H atoms at rest in the solar inertial frame.They need to be decomposed into the charge-exchange and photoionization components, and the charge-exchange component into the profiles of solar wind speed and density.This is done using the latitudinal invariant mentioned above and the known dependence of the charge-exchange cross section on the relative speed of colliding particles.
The scintillation and helioglow methods have provided qualitatively similar profiles of the solar wind, strongly structured during low solar activity and more flat during high activity epochs.However, there is a difference between their results concerning some details (Katushkina et al. 2013).In particular, Katushkina et al. (2019) pointed out ubiquitous maxima in the solar wind mass flux at mid-latitudes, a feature that has not been identified in the analysis of IPS observations (e.g., Porowski et al. 2022).Analysis of IPS observations returned magnitudes of the solar wind speed in the ecliptic plane different from that obtained from curated time series of solar wind speeds from in situ observations in the ecliptic plane.An illustration of this effect is presented by Sokół et al. (2019) and Sokół et al. (2020).A partial resolution of this issue was suggested by Tokumaru et al. (2021); see also Porowski et al. (2022).
Fitting the solar wind structure to the observed maps or light curves of the helioglow using a physical model of the helioglow is computationally challenging.It calls for fitting a multiparameter, nonlinear model of the helioglow either assuming that the flow parameters of ISN H are known or fitting them alongside the ionization rate profiles, which then adds to the complexity of the task.It also requires a good knowledge of the solar Lyα output in time and heliolatitude.The solar Lyα radiation is responsible for the illumination of ISN H, but also for the radiation pressure force, which modifies the distribution of the density and velocity of ISN H in the region where the helioglow is created.Additionally, a model of radiation transport within the ISN H gas must be selected and implemented in the code.All this results in the need for some simplifications of the physical model.These simplifications need to be thoroughly validated against available observations.The resulting approach is still computationally demanding, which precludes its use in near-real-time applications.Consequently, an alternative method is needed.
In this paper, we present a model-independent method of inferring latitudinal profiles of the ionization rate of ISN H atoms directly from Sun-centered light curves of the helioglow.The method can be regarded as a kind of machine-learning approach.Within this method, we adopt a model of the evolution of latitudinal profiles of the ionization rates for a certain interval of time, which is regarded as a training period for the method.For the needs of the method development, we consider this ionization rate history as known and correct.We also select a set of observations of the helioglow covering the same interval of time.With this, we define an algebraic connection of known latitudinal profiles of the total ionization rates of ISN H with the light curves of the helioglow.This algebraic connection is different for different vantage points around the Sun, but universal in the sense that for a given location, it is valid for any phase of the solar activity cycle, if the chosen time interval for the data covers at least one solar cycle.Subsequently, we reverse the algebraic connection, so that now the input is the observed light curve, and the result is the ionization rate profile.The intention is to develop such an algebraic system based on existing data and to use it in the future to retrieve the unknown profiles of the ionization rate based on future measured helioglow light curves.
The resulting profiles of the ionization rate can be further decomposed into the charge-exchange and photoionization components if additional information is input into the system.Ideally, this information is the density and speed of the solar wind measured in situ in the ecliptic plane, but it can also be an independently obtained photoionization rate.Subsequently, the charge-exchange profile can be decomposed into the profiles of density and speed using the latitudinal invariance of the solar wind mechanical energy.With the results of this decomposition, it is possible to analyze the evolution of the solar wind latitudinal variation and calculate survival probabilities of H ENAs.
The method presented here is intended for close-to-real-time analysis of observations planned for the forthcoming NASA mission Interstellar Mapping and Acceleration Probe (IMAP; McComas et al. 2018).The helioglow observations will be performed by the GLObal solar Wind Structure (GLOWS) experiment, and the results of the analysis will be directly applicable for the assessment of survival-probability corrections for ISN and ENA observations by IMAP-Lo, IMAP-Hi, and IMAP-Ultra experiments.

Input Components of the System
The idea of the proposed method is based on the following assumptions: 1. a physical link exists between the ionization rates of ISN H and the distribution of the helioglow in the sky 2. a history of the ionization rate evolution in time and heliolatitude is known for the time span of the helioglow observations used as the input to the construction of the method 3. modulation of the light curve observed along a circle in the sky with the center close to the Sun is sufficient to retrieve the profile of the ionization rate.
All of these assumptions seem to be well justified.
The distribution of the helioglow in the sky is known to depend on the geometry of the inflow, the vantage point, the history of the ionization rate variation, and the illumination of ISN H by the solar Lyα radiation, as it was concluded from modeling (e.g., Meier 1977;Quémerais 2006;Kubiak et al. 2021).Thus, the physical link between the geometry of the flow of ISN H, the ionization rate variation with the heliolatitude, and the vantage point is relatively well understood, even though some details require further studies.Anyway, the connection is deterministic and of a physical nature.
The evolution of the ionization rate with the solar activity phase has been studied in several papers (e.g., Bzowski et al. 2003;Lallement et al. 2010;Katushkina et al. 2019;Koutroumpa et al. 2019;Sokół et al. 2019Sokół et al. , 2020;;Porowski et al. 2022Porowski et al. , 2023)).While detailed results differ, these papers show a qualitative consensus, and the conclusions agree with the conclusions from in situ observations of the solar wind by McComas et al. (2008) to the extent possible when comparing global remote-sensing observations with point-like measurements.These qualitative agreements suggest that the assumption of a good knowledge of the history of the ionization rate evolution is fulfilled.
The third assumption, of the suitability of Sun-centered light curves for retrieval of ionization rate profiles, is supported on one hand by an analysis by Bzowski et al. (2003), who successfully retrieved solar wind profiles based on helioglow observations from such circles, and on the other hand by conclusions by Kowalska-Leszczynska et al. (2024), who showed the sensitivity of the light curves to various ionization rate profiles and to variations in the solar Lyα output.

Profiles of the Ionization Rate
For the development of our method, we adopted the profiles of the total ionization rate as a sum of the charge-exchange ionization rate profiles obtained from the IPS method (Porowski et al. 2022(Porowski et al. , 2023) ) and a spherically symmetric photoionization rate calculated from a correlation between the solar EUV flux and the solar F10.7 radio flux (Sokół & Bzowski 2014;Sokół et al. 2020), which together make the WawHelioIon model.However, the methodology we develop in this paper potentially allows us to adopt a different model for the ionization rate evolution, e.g., that obtained from helioglow analysis by Lallement et al. (2010), Koutroumpa et al. (2019), or Katushkina et al. (2019).
The WawHelioIon model provides the ionization rates in the form of a time series of latitudinal profiles from −90°to 90°w ith 10°steps, i.e., 21 latitudinal bins.The time resolution is equal to the Carrington rotation period (CR), with the time bins centered at the centers of CRs.
Since in the equatorial plane at 1 au the ionization rate values are on the order of 10 −7 s −1 , for convenience we scale them by a factor 10 6 .A visualization of the scaled WawHelioIon ionization rates for the times of passage by the Earth through the upwind and crosswind longitudes are shown in panels (1a) and (1c) of Figure 1.

Decomposition of the Ionization Rate Profiles into Legendre Coefficients
Processing of the WawHelioIon ionization profiles comprises changing their tabular representation into a functional form to express the ionization profiles in the whole latitude range by a series of global coefficients.To that aim, we use the model based on Legendre polynomials (hereafter referred to as the Legendre model) from Porowski et al. (2022).The Legendre model has been used to quantify the threedimensional solar wind structure, demonstrating its suitability for modeling latitude-dependent quantities under an axially symmetric regime, which commonly appears in solar physics.
Before fitting the Legendre model to the ionization profiles, we convert the latitude range into the range (−1, 1), where −1 represents the south pole, by conversion of heliolatitude f into ( ) z cos 2 p f = -.With this, the Legendre models are fitted to all profiles, providing us with a new representation of the ionization profiles.After fitting, each profile is represented by a set of 14 coefficients and the basis functions defined by the Legendre model.We assess the contribution of each of the principal components (PC) to the total variation in the ionization rate profiles and categorize the PCs into two groups: significant and insignificant.The significant PCs explain the dynamical behavior of the ionization profiles and contain information about the temporal evolution of the ionization rate profiles.
We found that half of the PCs show negligible contributions to the total variation, so we apply filtration of such PCs, reducing the final dimensionality of the ionization profiles to seven.We stay in the PCA frame and transform into this frame the base functions of the Legendre model corresponding to the Legendre model coefficients, which is necessary to perform the reconstruction of the ionization rate profiles in the PCA frame.
Denoting the PCs of the Legendre model coefficients of an ionization profile as pc prof , the vector of the corresponding values in the PCA frame as const, the ionization rate profile β H (z, t) for a time t may be expressed as where bas(z) is the vector of the Legendre model bases in the PCA frame, and is the static part of the profile, identical for all times.The first term at the right side of Equation (1) explains the dynamic changes of the ionization profile, while the second term, referred to as static, introduces a constant shape in the z domain, required for the proper reconstruction of the profiles.This shape is presented in Figure 2. The vector of constant values const is a result of the selection of the PCA frame as the final working frame in the situation when the PCs are used to scale the base functions.The constant values appear during the PCA process due to the shifting of the initial Legendre model coefficients into their center-of-gravity frames and are needed in the PCA frame to provide a proper scaling of the base functions, and so to reconstruct the ionization profiles correctly.To that end, a transformation of these shifts into the PCA frame is made, providing the proper scaling of the base functions in the PCA frame.The base functions in the PCA frame for the upwind vantage point are shown as an example in Figure 3. Without filtering, the reconstruction of β H (z) in the PCA frame provides the same results as in the Legendre model frame.By contrast, when filtration in the PC domain is applied, the variabilities of the less significant PCs, which usually correspond to noise, are attenuated, but after filtration the naked values of the shifts are transformed into the PCA frame and maintained there, resulting in the static term in Equation (1).In the case of full filtration, no variability of the profiles would occur, leaving only a static term, i.e., β H (z) = stat(z), which is equivalent to a simple mean light curve with respect to all the input light curves at a given vantage point.Each PC order has its own contribution to the overall mean light curve, which is a linear combination of the profiles of the base functions.When the ith PC is switched on, the mean light curve becomes deviated due to modification of the ith coefficient of the base, reflecting dynamical changes of the light curve in time in the PC frame.In our case, the first seven PCs are sufficient to obtain a correct dynamical description of the temporal evolution of the light curve, which leads to Equation (1).However, the last seven PCs, even though they do not reflect significant dynamical changes in the light curve, must be taken into account in the form of constant values to ensure the correct shape of the mean light curve in the final sum of the contributions of the base functions to the entire light curve.That is why the stat(z) term is kept after filtering when the PCA frame is used as the working frame.
Summarizing the main steps of the processing of the ionization rate profiles, first, we perform Legendre decomposition for each tabulated latitudinal profile of the ionization rate obtained from WawHelioIon.Next, we perform PCA for the independent latitudinal coefficients of the obtained Legendre model, finding that we are able to express the dynamical  1a) and (1c)) and of the corresponding light curves retrieved from SWAN maps of the helioglow for the days nearest to the upwind and spring crosswind positions (panels (1b) and (1d)) for the time interval 2000-2021, after filling up the gaps and approximating the light curves using the sin-cos decomposition, as discussed in Section 2.2.4.The horizontal axes in the ionization rate panels correspond to heliolatitude, and those in the helioglow panels to the angle along the scanning circle, with 0°corresponding to the northernmost point in the scanning circle.The light curves are binned into 4°bins.The color bars at the bottom of the panels represent the magnitudes of the ionization rate, expressed in 10 −6 s −1 , for the ionization rate panels, and at the helioglow intensity in Rayleigh for the helioglow panels.Note that each of these scales is different.changes of the profiles using the first seven PCs.The last 7 PCs out of 14, which are, in fact, seven constant values for all the profiles, represent the common properties of the profiles.They are necessary for their proper reconstruction.

Helioglow Light Curves and Their Processing
As the second input component of the WawHelioIonMP model, we use observations of the Lyα helioglow from the SOHO/SWAN experiment (Bertaux et al. 1995;Bertaux 2020), which provide us with a purely empirical input to the model.In our approach, we use helioglow light curves retrieved from the SWAN full-sky maps.
A light curve is retrieved from an individual helioglow map along a circle with an angular radius of 75°, centered at 4°off the Sun in the ecliptic plane.For illustrative examples, see Figures 1 and 2 in Strumik et al. (2021).This selection was made to obtain light curves corresponding to those planned for the GLOWS experiment.
Since the shape of such light curves depends on the orbital position of the detector with respect to the direction of inflow of ISN H (see, e.g., Figures 1 and 6 in Bzowski et al. 2023b), we need to perform the analysis for carefully selected positions (ecliptic longitudes) separately.In this paper, we select the upwind, downwind, and spring crosswind positions to present our method.For ionization rate determination in the future, we plan to use approximately 12 positions, distributed symmetrically relative to the upwind/downwind directions.
Here, each of the three selected positions is processed separately, and from now on we maintain the light curve data grouped by the positions of their respective vantage points.The analysis is performed identically but independently for each of the selected vantage points.

Light Curve Selection
Our method requires light curves observed year by year at the selected fixed vantage points.This implies that the light curves should be retrieved from maps observed on specific days of the year, corresponding to the upwind, crosswind, and downwind positions of the spacecraft.For the reference upwind longitude, we adopted 252°ecliptic longitude, in agreement with the results obtained by Bzowski et al. (2023a) based on SWAN light curves, as well as those by Lallement et al. (2005) based on analysis of spectral observations of the helioglow.
Since SWAN maps are not available for each day of the year, in practice we selected those available nearest the chosen positions.Experience showed that deviations of a few degrees in the longitude of the vantage point are unimportant because the light curves do not evolve significantly within a time span of several days.
To build the model, we chose one light curve for each of the three selected vantage positions for all of the years between 2000 and 2021.

Light Curve Masking
Light curves retrieved directly from the SWAN maps merge signals from the helioglow itself and from extraheliospheric astrophysical sources, which include EUV-bright stars, the Milky Way, and an unresolved background (see, e.g., Strumik et al. 2020).Examples of full light curves are presented by Bzowski et al. (2023a) in Figure 6.The positions of the extraheliospheric sources are fixed in the sky.In addition to these, from time to time transient sources appear, mostly comets (Mäkinen et al. 2001).
To eliminate these unwanted contributions to the light curves, these sources are masked, as discussed by Bzowski et al. (2023b).Since the SWAN instrument cannot look close to the Sun, and a portion of the sky is blocked by the body of the SOHO spacecraft (Bertaux et al. 1997), these regions are also masked.The solar and spacecraft masks change their positions relative to the stars because of the spacecraft's motion around the Sun.Therefore, separate masks are derived for all positions of the spacecraft and applied to individual light curves.The angular resolution of the light curves retrieved from the SWAN maps is relatively high, 0°.1 in the phase angle, to facilitate identification and masking of the point sources.
Due to the application of the masking procedure, gaps in the light curves appear, spanning from several to several dozen degrees in phase angle.Particularly large gaps appear when the vantage point is located close to the autumn crosswind position.This happens because the Milky Way coincides with large portions of the scanning circle.Therefore, light curves from this portion of the year must be rejected from the analysis.During other times during the year, the Milky Way intersects with the scanning circle in two separate locations, resulting in two gaps of a moderate size.We found that the gaps can be filled using methods discussed in Section 2.2.3.For further analysis, the masked and gap-filled light curves are rebinned in phase angle into 4°bins.For a comparison of the highresolution light curves, the light curves with the masks applied, and the rebinned light curves, see Figure 6 in Bzowski et al. (2023b).

Light Curve Gap Filling
The method we have developed calls for approximation of the light curves with a sine + cosine series.However, gaps in the light curves, discussed in Section 2.2.2, may bias the fitted light curve model.We found that gaps that significantly disturb the fitted models are larger than 30°.Such gaps must be filled before the decomposition is performed.The gaps are filled by an interpolation process.We found that filling the gaps larger than 70°is not effective.Therefore, for the upwind and downwind locations, the light curves with the gaps larger than this limit are rejected from the analysis.In the case of the spring crosswind, we must deal with gaps larger than 70°.
The gaps are filled using a two-step approximation method.In the first step, the light curve with gaps is approximated using a sine + cosine series, as presented by Bzowski et al. (2023b) in Equation (1).The adopted order of the series is chosen separately for each of the vantage points and listed in Table 1.This provides an approximate model of the light curve shape.Now, residuals of these models are obtained, and the model itself is projected on the gaps.Subsequently, the gap fillers are modified by adding a small Gaussian noise with the width parameter obtained from the residual analysis.This yields the final gap filler.This is done to ensure similar statistical uncertainties for all elements of the filled light curve before the final decomposition, discussed in Section 2.2.4.The magnitude of the added noise is assessed separately for each light curve because different light curves are typically characterized by different signal-to-noise ratios.After filling the gaps, the light curve is binned with a 4°step in spin angle, providing input for the light curve decomposition described in Section 2.2.4.For the case of the spring crosswind, where the gaps are so large that the sine + cosine series fit is not stable, we provide rough background estimates of the helioglow signal expected in the gaps using Wolfram Mathematica built-in function EstimatedBackground.We randomize only a few points of such rough background in the gaps and subsequently insert a noise estimated using the parts of the light curve that are not rejected due to the masking.Then we proceed to the first step as described above.We verified that for small gaps, the use of the rough background estimation provides the same results as in the case of not using it.
These processing steps are visualized in an example in Figure 4 for a typical upwind light curve.A full complement of 22 light curves for the upwind positions is presented in panel (1b) of Figure 1 and for the crosswind position in panel (1d) of this figure.

Light Curve Decomposition
The ionization rate retrieval method we present requires the decomposition of the input light curves into coefficients of the approximation model defined by Bzowski et al. (2023b) in their Equation (1).Before the final decomposition of a light curve, we adjust its calibration by a phase angle-independent factor on the order of 1-1.25 (M.Strumik et al. 2024, in preparation) based on analysis of the evolution of the brightness of several stars visible in SWAN maps during the time span of SWAN data.This factor slowly varies with time.We found that its application slightly reduces the spread of the results without any significant systematic modification of the derived ionization rates.This is understandable given the insight provided by Kowalska-Leszczynska et al. (2024).These authors found that a delay time between the time variation in the ionization rate and reaction of the helioglow is relatively short, only several months.The order of the expansion in the approximation model is optimized separately for individual vantage locations.The optimization for the three vantage locations discussed in this paper was performed based on residual analysis.We looked for approximation orders of the interpolation and final models that provide a distribution of model residuals closest to the normal distribution.For a wide range of approximation orders, the results were quite similar, so there is no clearly visible dependence of the interpolation order and the quality of the light curve approximation.We decided to use the approximation orders listed in Table 1.
The result of light curve processing is a set of coefficients of the final light curve decomposition.These coefficients are subjected to further statistical analysis.To that end, they are organized into matrices corresponding to individual vantage points.The order of the rows in the matrices corresponds to the sequence of the years used in the analysis.

Processing of Light Curve Coefficients
The coefficients obtained from light curve decomposition are now subjected to a similar analysis as was done for the input ionization rate profiles.We seek to identify the intrinsic and uncorrelated variabilities performing PCA.We found that the first five principal components are sufficient for the reproduction of the helioglow light curves.Since PCA provides uncorrelated variabilities, the obtained light curve parameters are uncorrelated.These five PCs hold information on 99.8% of the total variability and thus explain the shape evolution of the entire light curve.We organize the entire set of the PCs for a given vantage point into an n × 5 matrix cf lcrv , where n is the number of years, and adopt individual columns as the vectors of empirical parameters used for building the WawHelioIonMP model.Obtaining the light curve parameters from PCA concludes this part of the analysis.Plots of the five columns of the upwind cf lcrv , showing the temporal evolution of the light curve parameters, are presented in Figure 5.
All the input light curves belonging to a given vantage point are expressed through their respective numerical coefficients cf lcrv , which comprise information about the state of the helioglow in the sense that they describe how individual light curves deviate from the overall mean light curve shape for the given vantage point.The mean light curve shapes for the three vantage points discussed in the paper are shown in Figure 6.Sorting the coefficients by time enables connecting the observed light curves with the profiles of the ionization rate.The cf lcrv vectors are used as regressors in the final fit.

Model Definition
We construct a semiempirical model of the ionization rate profiles devised to express the dynamical part of Equation (1) through parameters referring to the coefficients of the light curves cf lcrv .In other words, we parameterize the ionization rate profiles β H by a multilinear combination of the five columns of cf lcrv .Assuming that a multilinear dependence exists between the columns of pc prof and cf lcrv , we can write where the index i is the number of a column, ranging from 1 to 8 or 7, depending on the vantage point, j numbers the columns in cf lcrv , t is the time, c i is the ith element of the vector of constant values c, pc i prof is the ith coefficient of the ionization rate profile pc prof in the PCA space, cf j lcrv is the jth PCA coefficient obtained from the analysis of the corresponding helioglow light curve, and a ij are the elements of the coefficient matrix.
Expressing pc prof in the dynamical part of the profile in Equation (1) by the parameterization defined above leads to the following formula that relates the ionization rate and the helioglow profiles To obtain this model, we need to determine the matrix of multilinear coefficients a ij and the vector of constants c from the fit.Since the variables in Equation (4) are uncorrelated due to the application of PCA, we can divide the fitting procedure into seven independent runs for all individual indices i, thus avoiding performing minimization in a multidimensional space.
The remaining terms in Equation (4), i.e., stat(z) and bas i (z), are determined during model formulation and do not change with time.In practice, we perform separate sets of fittings for all vantage points, obtaining the final model parameters stored in separate matrices A, assigned to the upwind, downwind, and crosswind vantage points.In the future, we plan to identify optimum grid locations k = 1, K, K around the Sun for which the matrices A will be prepared.

Application of the Method
To estimate the ionization rate profile corresponding to a newly observed light curve using the method outlined above, we proceed as follows.
1. Attribute the actual vantage point to the nearest grid location k. 2. Preprocess the light curve by application of masks (Section 2.2.2). 3. Fill the gaps using the procedure detailed in Section 2.2.3.4. Rebin the light curves into the desired resolution. 5. Decompose the light curve to obtain the expansion coefficients, as in Section 2.2.4 6. Perform PCA on the expansion coefficients; in practice, this is done using the transformation matrix precalculated for the grid location k. 7. Based on the obtained five-element pseudovector cf lcrv (i.e., the first five PCs of the light curve), calculate the ionization rate model β(z) using Equation (4).We operate separately on data grouped by the times of orbital positions corresponding to the upwind, downwind, and crosswind locations.The model is a collection of K = 3 separate sets of matrices, and a given light curve is uniquely assigned to one of them.In the future, K will be increased, as discussed in Sections 2.2 and 4.
As an example application of the WawHelioIonMP model, we perform prediction of the ionization rate profiles for the upwind and downwind vantage points in 2022 and 2023, and for the crosswind point in 2022.The results are shown in Figure 7.We used several SWAN light curves obtained around the selected positions.They were processed identically as the light curves used for training the model.The resulting set of ionization rates has an estimated relative error of ∼20%.

Validation of the Method
The quality of the reproduction of the ionization rate profiles can be assessed visually by inspection of Figures 8-10, which show the input ionization rate profiles from the WawHelioIon model and the corresponding reproduced profiles along with their statistical 99% confidence ranges, separately for each year for the upwind, downwind, and spring crosswind vantage points.
The WawHelioIonMP method was validated using a bootstrap-type test.In this test, we adopt as a baseline the reconstructed light curves obtained using the WawHelioIonMP trained on all available years for a given grid point.Subsequently, we retrain the model using a randomly selected subset of 80% of the input light curves, and the remaining 20% are used as the validation light curves.We repeat this 100 times and calculate the differences between the baseline reproduced profiles and those obtained from WawHelioIonMP trained on the reduced subsets.With this, we make a histogram of the differences between the baseline and reduced models.We repeat this test for all three vantage locations.
The bootstrap test results are summarized in Figure 11.It shows the relative frequencies of the differences between the input ionization rates and the reconstructed models.The black color corresponds to the differences for the full model trained on the training set, and the red color to that of the bootstrap models obtained from the projection of the testing set.The training set and testing sets have no common elements.
No systematic differences between the distributions for the full and bootstrap models are visible.Also, there are not many outlying values.Based on the residual distributions, we estimate the error of a single model projection to ±0.12 × 10 −6 s −1 for each orbital position, which is about 1.5 times larger than the error of the model reconstruction and accounts for ∼20% of the H ionization rate values.Therefore, the error of model projection is not significantly larger and we find it satisfactory.
As a second validation criterion, the uniqueness of the model prediction is investigated.The WawHelioIonMP model is designed to provide ionization rate profiles of ISN H based on light curves observed close to the selected vantage points (upwind, downwind, and crosswind).Now we check how much the WawHelioIonMP model predictions are unique in the sense of time shifting, i.e., how much the reproduction of the  ionization rate profiles worsens when the input helioglow light curve used for the projection is taken from times much farther than ±1 CR from the optimum time for the given vantage position.This allows us not only to see how a time shift of the light curve used deteriorates the results but also enables us to estimate the favorable structure of a future extension of the model into other orbital positions.
We took light curves from a time range Δcarr = ±2 CR around the selected vantage points and applied the WawHe-lioIonMP model trained for individual points.We calculated model projection errors for randomly selected light curves lying within a time span of ±2 CR around the time for the given vantage point with respect to the ionization rate profile assigned to this vantage point.The light curves were selected at random within ±2 CR around the vantage point positions from the entire period 2000-2021.For each model prediction, we calculated the mean square error of the prediction MSE β and connected it with the time shift Δcarr.Since the number of light curves for all years for the selected vantage points is relatively high, we could bin the MSE β versus Δcarr dependence and estimate the 99% region of MSE β , thus determining the magnitude of Δcarr for which MSE β has its minimum.The results are shown in Figure 12.
Since the model is designed for a given vantage point, we expect a gradual deterioration of the model predictions for light curves observed at times increasingly farther from the time of the given vantage point.The plots of the model error in Figure 12 seem to confirm this expectation: the larger the temporal shift from the vantage point, the lower the projection quality.This behavior of the model suggests that indeed, the model works best for the light curves closest to the time of the vantage point for which the model was trained.For these light curves, the average error is similar to the error determined in the bootstrap tests.This confirms the uniqueness of model predictions and demonstrates that the model is not producing a random output.
For the upwind and downwind vantage points, the minima of the errors are at Δcarr = 0 ± 0.15 carr, and the error characteristics are very symmetrical with respect to the vantage point.The magnitude of the error increases with the time shift, approximately symmetrically.For the spring crosswind, the error minimum is around −0.4 CR.As it can be seen, for different vantage points, the dynamics of the prediction quality degradation are different, but the largest found is for the upwind point.
We found that small time differences, within 0.5 CR of the nominal times, do not disturb the model significantly for the upwind and downwind vantage points.This is in good agreement with the expectations because the profiles of the ionization rates adopted to the WawHelioIonMP model construction have a temporal resolution of 1 CR, and usually vary slowly between the contiguous CRs.
The conclusion from the validation procedure is that most likely, it is sufficient to train the model for ∼12 vantage points around the Earth orbit without degradation of the prediction accuracy.

Discussion
In the development of machine-learning methods, it is important to ensure that the training data set is representative and appropriately balanced.In particular, for quasiperiodical phenomena, no variability phases are under-or overrepresented.Our WawHelioIonMP method was trained on two full solar cycles (SC) of SWAN observations, SC 23 and 24, which satisfies this prerequisite.If, however, the solar cycle for the time of future IMAP operations is significantly different than these two past cycles, then the ionization rates resulting from the application of WawHelioIonMP may be biased.We do not expect this to happen during the IMAP SC 25.SC 24 was exceptionally weak, with the magnitude of the maximum lower by almost 50% than that of SC 23 (Veronig et al. 2021).The strength of SC 23 was lower by ∼20% than that of SC 22.The strength of the ongoing SC 25 seems to be at least that of SC 24 and may reach that of SC 23.With this, the light curves observed by GLOWS will likely be inside the range of the training data used to develop WawHelioIonMP, so no bias in the derived ionization rates is expected.
Since the model is fitted using light curves from the time range of the last two solar cycles, it averages any potential secular evolution that may have appeared during the period.Also, the model does not take UV flares into account, which may disturb the real ionization rates.
WawHelioIonMP can be extended into any orbital position of the spacecraft if only the light curves for the selected time of observations are not rejected by the maximum gap size conditions.We anticipate that the optimum grid will be composed of the upwind, downwind, two crosswind, and additionally one or two intermediate locations for each quarter of the Earth orbit, i.e., a total of 8 or 12.We acknowledge that the optimum spacing of the vantage grid may be nonuniform in ecliptic longitude.In future work, we will determine the optimum spacing of the grid and verify if any of the grid locations must be rejected because of the gap size criterion.We anticipate that this may be the case for the autumn crosswind location, where a large portion of the light curve is filled with the Milky Way.For such conditions, alternative methods of determination of ionization rate profiles need to be developed.
With the latitudinal profiles of the ionization rates obtained using WawHelioIonMP, it is possible to retrieve the latitudinal structure of the solar wind speed and density using some additional information.The ionization rate is a sum of chargeexchange and photoionization rates.The photoionization rate is expected to vary mildly with heliolatitude, as shown by Strumik et al. (2021Strumik et al. ( , 2024)).Neglecting this variation, one can calculate the photoionization rate for the ecliptic plane based on observations of the solar EUV spectrum or using appropriate proxies (Bzowski et al. 2013;Sokół & Bzowski 2014;Sokół et al. 2020) and subtract it from the ionization rate profile.The difference is a profile of the charge exchange rate, which is a product of the solar wind speed, density, and the relative speed-dependent reaction cross section.Le Chat et al. (2012) showed that the solar wind density is a function of the solar wind speed and that the coupling function is the latitude-invariable total flux of the solar wind mechanical energy (see their Equation (1)).This makes the density at a given heliolatitude a thirddegree polynomial of the solar wind speed at this latitude provided that the energy flux is known for a certain latitude.This can be obtained from in situ measurements, such as those planned on IMAP.With this, the speed and density of the solar wind can be resolved for all heliolatitudes.
WawHelioIonMP in its present implementation is based on helioglow observations from SWAN.It may be used for light curves from future SWAN observations or from a different helioglow instrument, like GLOWS, but in this latter case, the light curves need to be rescaled to the calibration of SWAN.We anticipate that only a scaling factor will need to be used.If a significant background is discovered, it will need to be subtracted before rescaling.
We anticipate that WawHelioIonMP may turn out to be less accurate in the case of light curves featuring transient phenomena, like bright comets because with these, the affected light curve does not resemble at all that the system has been trained on.Such light curves need to be either appropriately masked or, if the features due to a comet span too wide in the phase angle, rejected altogether.
WawHelioIonMP in its present form uses a training set of ionization rate profiles derived from the WawHelioIon model, which is based on IPS observations.Therefore, the obtained ionization rates, and consequently the solar wind structure, will agree by design with those obtained from IPS observations.Thus, WawHelioIonMP is not suitable for resolving the difference between the conclusions on the solar wind structure obtained from helioglow and IPS observations, discussed in Section 1.
However, WawHelioIonMP can be retrained using a different ionization model.In particular, this model may be based on helioglow observations themselves.The training ionization rates would need to be fitted to existing observed helioglow maps using a simulation model of the helioglow, and the derivation of the vantage grid points repeated.Then, new observations can be processed using the retrained model.However, this retrained system would be only as good as the background helioglow model used.Any issues that possibly might be present in the helioglow model would be conveyed into the retrained model.
2.1.2.Principal Component Analysis and Filtering of the Legendre Components of the Ionization Rate Profiles Next, we apply principal component analysis (PCA; Jolliffe 2002) to separate and analyze the contributions of different modes of the temporal variabilities hidden in the obtained Legendre coefficients.The PCA returns individual modes of the temporal variabilities as uncorrelated signals.

Figure 1 .
Figure1.Visualization of the ionization rates of H obtained from the WawHelioIon model (panels (1a) and (1c)) and of the corresponding light curves retrieved from SWAN maps of the helioglow for the days nearest to the upwind and spring crosswind positions (panels (1b) and (1d)) for the time interval 2000-2021, after filling up the gaps and approximating the light curves using the sin-cos decomposition, as discussed in Section 2.2.4.The horizontal axes in the ionization rate panels correspond to heliolatitude, and those in the helioglow panels to the angle along the scanning circle, with 0°corresponding to the northernmost point in the scanning circle.The light curves are binned into 4°bins.The color bars at the bottom of the panels represent the magnitudes of the ionization rate, expressed in 10 −6 s −1 , for the ionization rate panels, and at the helioglow intensity in Rayleigh for the helioglow panels.Note that each of these scales is different.

Figure 2 .
Figure 2. Static term of the ionization rate profile in the case of full PCA filtration for the upwind vantage point (see Equations (1) and (2)), shown as an example.The final shape of the ionization rate profile for a given year is obtained by the addition of the first seven PCs multiplied by the corresponding base functions shown in Figure 3.

Figure 3 .
Figure 3. First seven base functions for the ionization rate profiles for the upwind vantage point, with the numbering shown at the right side.In the model, each base function is scaled by the corresponding nth PC.

Figure 4 .
Figure4.Visualization of the processing steps for an example upwind light curve observed at CR2057.Left panel: results of the background masking and binning (orange circles) and gap filling (green circles), both cast on the extensive diffuse and stellar background (small black points).Right panel: results of the gap-filled light curve binning with a 4°step, adopted for fitting the sin+cos light curve model (red dots) and the fit result (black line).Additionally, the unmasked raw data are shown on a larger scale in the inset for comparison.

Figure 5 .
Figure5.Time series of the five most significant PCs of the light curves for the upwind vantage point.Each PC contains an uncorrelated mode contributing to the effective light curve variability at the upwind vantage point.The benefit of using PCs is that we can describe the entire temporal evolution of a light curve through a set of five numerical values, which is very convenient for building the WawHelioIonMP model.The left panel presents the first coefficient, and the right panel the remaining ones.This is to address the issue of the very different absolute magnitudes of the first and the other four coefficients.

Figure 7 .
Figure7.Prediction of β H (z) for the selected vantage points for the years 2022-2023, i.e., for the outside of the scope of the training data.The magnitude of the ionization rate is in the units 10 −6 s −1 .A few light curves, selected from time periods ±3 days around the upwind (blue), downwind (orange), and spring crosswind (aquamarine) vantage points were used to estimate the corresponding ionization rate profiles for each of the vantage points.The model predictions for a given vantage point agree well with each other for the upwind and downwind points, as expected.For the crosswind point, a larger scatter is visible, mostly due to the lower quality of the light curves resulting from large background-related gaps.

Figure 8 .
Figure 8. Reconstruction of the ionization rates for the upwind vantage point.The magnitude of the ionization rate in the units 10 −6 s −1 is shown as a function of the heliolatitude.The black line represents the original profile of the ionization rate, obtained from the WawHelioIon model.The red line is the reconstructed profile, and the yellow band marks an estimated region of 99% confidence level.The complete figure set (22 images) is available in the online journal.(The complete figure set (22 images) is available.)

Figure 9 .Figure 10 .
Figure 9. Similar to Figure 8 for the spring crosswind vantage point.The complete figure set (22 images) is available in the online journal.(The complete figure set (22 images) is available.)

Figure 11 .
Figure 11.Relative frequency of differences between the model reconstructions (black) and predictions (red) obtained in the bootstrap tests for the upwind, downwind, and crosswind vantage points.The horizontal axis represents deviations between the magnitude of the ionization rate obtained from the bootstrap model and that from the full model, expressed in the units 10 −6 s −1 .See text for details.

Figure 12 .
Figure12.Mean squared error MSE β of the WawHelioIonMP model prediction for the light curves observed at times straddling the center times for the upwind (UW), downwind (DW), and crosswind (CW spring ) vantage points.The limits of the shaded regions correspond to the area holding 99% of the statistics for a given time shift Δcarr.The vertical axes are in the units of (10 −6 s −1 ) 2 .

Table 1
Optimal Orders of the Approximation Formulae Used for Interpolation and Final Decomposition of the Light Curves for Individual Vantage Points