Solar Wind Magnetic Field Correlation Length: Correlation Functions versus Cross-field Displacement Diffusivity Test

The estimate of the solar wind magnetic fields’ parallel correlation length, λ, be it from the measured fields’ correlation functions or their spectral power at “zero” frequency, have long pointed toward short values on the order of 0.01 au. Evaluation of the mean cross-field displacements (CFDs), however, fails to show the decorrelation and resulting diffusion at the expected scales, pointing instead toward λ values on the order of 0.1 au or more. In an effort to understand this “order-of-magnitude” discrepancy and reconcile the approaches using correlation functions and the CFD diffusivity test, both approaches are applied here, with renewed attention to the “details” as well as the broader sense of the calculations, to a large, 20 yr long set of magnetic field and flow data from the ACE spacecraft. It is found that solar wind intervals too short relative to λ are a likely reason for some underestimate through the correlation-function approach, causing a premature drop of the correlation functions. Once converged to their long-time limit, however, the correlation functions produce magnetic field correlation lengths very much consistent with the magnetic-field-line (MFL) correlation lengths of the diffusivity test, with nearly matching distributions of the correlation lengths corrected by the proper ratio of their theoretical estimates. The fields’ correlation lengths mostly range from 0.03 to 0.08 au, and the MFL correlation lengths from 0.04 to 0.3 au, with peaks at 0.075 and 0.15 au, likely due to nonlinear and quasilinear regimes of MFL wandering. As for the power-at-zero-frequency approach, it is doomed by the solar rotation.


Introduction
The correlation length of the magnetic fields in the turbulent solar wind (SW) has traditionally been evaluated from the correlation function of the magnetic fields, be it a one-point time auto-correlation function 1 or a multi-spacecraft two-point correlation function of the measured fields (e.g., Richardson & Paularena 2001;Matthaeus et al. 2005Matthaeus et al. , 2010Weygand et al. 2011). An alternate yet mathematically related method evaluates the correlation length from the fields' spectral power at "zero" frequency (Jokipii & Coleman 1968;Jokipii & Parker 1969), or somewhat equivalently, from the flattening frequency of the spectral power (Ragot 1999(Ragot , 2006b(Ragot , 2006c(Ragot , 2018 and the resulting diffusion length of the magnetic field line (MFL) cross-field displacements (CFDs) or of the MFL spreading.
Both correlation-function and zero-frequency-power methods seem yield consistent results for the fields' parallel correlation length in the SW, on the order of 10 −2 au or 1-2 × 10 11 cm Weygand et al. 2011;Jokipii & Coleman 1968;Jokipii & Parker 1969; see also Matthaeus et al.1986;Matthaeus et al. 2010, who find a correlation length of ∼4-5 × 10 11 cm, 2-4 times larger; Matsui et al. 2002, who find a correlation length of ∼0.8 ×10 11 cm from a correlation analysis in the Fourier space, and Zhao et al. 2018, who estimate correlation lengths for slab and 2D magnetic fluctuations of ∼10 11 cm and half of 10 11 cm, respectively, through an Elsässer-variables approach). Evaluation of the MFL mean CFDs (Ragot 2006e), however, fails to show the expected decorrelation and resulting diffusion at such short scales of about 10 −2 au and a few times 10 −2 au, respectively. Instead, the extended supradiffusion of the mean CFDs past 10 12 cm or 10 −1 au in slow SW (and 3-5 × 10 −2 au in fast SW where cross flow may precipitate the observed decorrelation; see Section 2.2.2 of Ragot 2006e) suggests correlations of the SW magnetic fields over scales much longer than anticipated. This issue needs to be resolved because the actual value of the magnetic fields' correlation length is so fundamental to the proper modeling of solar energetic particles (SEPs). How SEPs and other SW energetic particles propagate through the SW magnetic fields very much depends on the actual value of the magnetic fields' correlation length. Knowing the actual correlation length is also important to evaluate the predictive power, at Earth or at other locations of interest, of the SW observations made at L1 or elsewhere upstream in the SW flow (Crooker et al. 1982;Richardson et al. 1998;Richardson & Paularena 2001).
Parallel in "parallel correlation length" is sometimes defined along the direction of the main flow, the radial, sometimes along that of the main unperturbed magnetic field, the Parker spiral, and other times along that of the local mean magnetic field. While the definition of the parallel direction does have some bearing on the result for the parallel correlation length, variation in the definition of the parallel direction does not in itself produce such strong effects as to explain the apparent roughly order-of-magnitude discrepancy between estimates of the parallel correlation length. (The correlation length along the parallel direction is usually the longest (Weygand et al. 2009), except in the fast SW where Dasso et al. (2005) and Weygand et al. (2011), who define the local mean magnetic field as the parallel direction, find it slightly (by a factor ∼2 at most) shorter than in the perpendicular direction.) In this paper, we use as parallel direction both the direction of the main flow, for the correlation-function analysis of Section 2, and the direction of the Parker spiral, when applying the CFD diffusivity-test method of Section 3 (see the beginning of Section 2.2). To compare the results of both methods, we translate the results for the correlation lengths from both methods back into times elapsed along the SW flow (see Section 4). We have to use the direction of the Parker spiral for the CFD analysis, because it is the background field direction in the study of SW MFL transport (be it through analytical calculations, data analysis, or numerical simulations). It is the most convenient choice from the theoretical viewpoint. Due to the solar rotation, the Parker spiral direction is the direction of the main unperturbed magnetic field, that is, the direction of the magnetic field in the SW flow were it perfectly regular (unperturbed or smooth).
If the variation in the definition of the parallel direction does not explain the apparent order-of-magnitude discrepancy between estimates of the parallel correlation length, what does? Here, we analyze twenty years of ACE magnetic field and flow data to investigate, "confirm," and resolve the issue. By computing parallel "correlation lengths" from the same data set and through the same basic data analysis method, we confirm the potentially large discrepancy between methods based on correlation-function computation and the method based on the mean CFDs computation and diffusivity test. We propose a more cautious approach to the correlation-function computations/interpretation as well as to the application of the CFD diffusivity test. We find that both methods can actually be reconciled, with consistent results for the probability distribution functions (PDFs) of the computed correlation lengths, as well as for the individual correlation lengths. We start in Section 2 with the correlation-function method to evaluate a correlation length. A short general discussion of the method in Section 2.1 is followed in Section 2.2 with an application to the SW magnetic fields at 1 au using ACE magnetic field and flow data. To help distinguish fundamental scale-dependent behaviors of the correlation functions from the effects of SW transients and other large-scale inhomogeneities, we extend the scale-dependent study (both in separation scale and interval duration) to a couple of "pure" fast coronal-hole and slow SW intervals at solar minimum, in addition to the series of 180 37 day, 720 9 day, and over 7000 1 day intervals of the 1999-2018 ACE data set. We also present in Section 2.3 simple simulations and analytical estimates of the correlation functions, and using in situ projected power spectra of SW magnetic field fluctuations, we compare the results of simulation and theory to the correlation functions measured in a couple of SW intervals. Section 3 deals with the CFD diffusivity-test method and applies an improved method to the same ACE data set. It shows how the correlation length can be more systematically estimated and interpreted without recourse to spectral power computations, which are notoriously imprecise at low frequency. In Section 4, we reconcile both approaches and their correlation-length estimates. Our conclusion follows in Section 5.

Discussion of the Method
A physicistʼs simple approach to the calculation of a correlation length through a correlation function C A (l) ≡〈A(z) A(z + l)〉/〈A(z) 2 〉 often consists in approximating the correlation function by an exponential, e −l/ λ , and finding the length scale λ that best fits the correlations as function of the scale l. Alternately, if the decrease of the correlation function C A (l) is badly fitted by an exponential, a correlation length can always be estimated as the length scale λ 1/e such that C A (λ 1/e ) = 1/e. A stricter definition involves the integral of the correlation function: ( ) dl C l c 0 A ò l º ¥ , with often similar results (e.g., Matthaeus et al. 2005). The integral definition stands whether the correlation function C A (l) drops exponentially or not. For an exponential drop, λ = λ c = λ 1/e . If the drop of the correlation function is not exponential, the length scale λ 1/e may still be a worthy approximation. Due to lower statistics and higher uncertainties at the longer scales l, it may sometimes be the only practicable estimate of a correlation length. This is about all there is to say about the basics of the method (s) for the calculation of a correlation length. And we could very well proceed from here with the application to the ACE magnetic field data of Section 2.2. But we have also been wondering about how a statistician would evaluate a correlation length from the correlation function C A (l) and how the result would relate to the correlation length evaluated by a physicist. Would it give a different result? Could it help better understand the physics? And if it does not, can we put the question to rest? If our secondary questions about the statisticianʼs approach appear too involved or lacking interest to the reader, they can simply be skipped or filtered out. The main points of this paper are not affected by our parallel discussion of the statisticianʼs approach and calculation of L α in Section 2.2.4. From here, the reader only interested in the main points of this paper may skip to Section 2.2.
A statisticianʼs approach to the calculation of correlations through the use of the correlation function C A (l) is more concerned with the testing of the correlation (or noncorrelation) hypothesis on various length scales l, that is, with the comparison of C A (l) to a critical value N ,  a of the correlation that depends on both the number N of observations used in the computation of C A (l) and the level α of significance of the correlation, or confidence (1 − α) × 100% in the answer. Given a sample of N observations of the field A used in the computation of the correlation C A (l), finding a correlation greater than N 0.01,


, the statistician will be able to conclude that the fields correlations are significant at the α = 0.01 level. The probability of the fields not being correlated on the length scale l is less than 0.01 (very low). In other terms, the statistician concludes with (1-0.01) × 100% = 99% confidence that the fields are correlated on the scale l (see a more detailed description below). Now, if the statistician chooses to do the noncorrelation hypothesis testing with a value of the significance level 0.05 a¢ = or 0.1 instead of 0.01 and still finds that, for a new length scale l l ¢ > , ( ) C l A ¢ exceeds the critical value N ,  a¢ , they will then conclude that the probability of the fields not being correlated on the new length scale l¢ is less than 0.05 or 0.1, or equivalently, conclude with 95 or 90% confidence that the fields are correlated on the scale l¢. At what point, from the physicistʼs point of view, will the statistician be able to conclude that the fields are not correlated? At a significance level of 0.25, 0.5, 0.75? Could this be a better approach to determining the decorrelation length? Is it practicable? What does it tell us about the "physicistʼs approach"?
By computing large statistics of the correlation functions from the 20 yr long ACE data set, we will see that the earlier results (e.g., Matthaeus et al. 2005;Weygand et al. 2011) based on the computation of correlation functions are consistent with the simple physicistʼs approach with an estimate of λ 1/e from ( ) 1 e x l = , which seemingly gives a relatively short estimate of the correlation length, an estimate for which the statistician would conclude with better than 99% confidence that the fields are still correlated, or find that the probability of the fields not being correlated on that scale is much less than 0.01. This is not necessarily an issue if one understands the correlation length to be the scale over which the fields are still strongly correlated but beyond which (or a few of which) the correlations sharply drop.
Coming back to the basics, 2 "a correlation coefficient r cannot be used directly to indicate the degree of correlation" (e.g., Bevington 1969). This would seem to disqualify the "simple physicistʼs approach" from the start, but it does not. The integration of the correlation function over all scales to estimate the correlation length λ c is not affected by this limitation, because it does not make claims about the actual degree of correlation at any given scale. It estimates a scale over which strong correlations exist, and beyond which strong correlations disappear. The estimate λ 1/e of the correlation length based on the exponential approximation would be baseless on its own, but as an approximation of the integral result, has merit, provided the correlation function is indeed not too far from an exponential. Although the approach based on the integral definition of the correlation length is in principle always valid, it is sometimes difficult to apply because of slow convergence and/or large error bars in limited statistics (at the larger scales). The uncertainty in the correlation function at the larger scales may, in some cases, make the λ 1/e estimate the only practicable estimate. So some kind of "backup" method for a consistency check would be welcome. This is also why we are tentatively exploring the possibilities offered by a "statisticianʼs approach," and explaining the rudiments of that approach below.
In a common test of the correlation coefficient r known as the zero-correlation hypothesis testing, the value of r derived from a sample of N data points or fields measurements is interpreted in relation to the probability distribution (Pugh & Winslow 1966;Bevington 1969) that any random sample of uncorrelated data points would yield a linear-correlation coefficient equal to r from a parent population that is completely uncorrelated, i.e., with zero correlation, and to its integral , 2 , 2 c r r 1 ò r r = -which effectively gives the probability that a random sample of uncorrelated data points would yield a correlation coefficient as large as or larger than r, for a one-tail (positive correlation) test. In Equation (1), ν = N − 2 is the number of degrees of freedom. A small value of P c (r, N) indicates that the observed data are probably correlated. In practice, a value α called a significance level is preselected for the integral P c (r, N) of Equation (2) and the corresponding critical value r c for which P c (r c , N) = α is estimated (r c is the N ,  a of our earlier discussion in this section), to which the computed correlation coefficient r is then compared. Such a comparison indicates whether or not it is likely, at an α significance level, that the data points could represent a sample derived from an uncorrelated parent population. If r r c , the zero-correlation hypothesis testing at an α significance level fails; it is concluded that the observed fields are correlated at an α significance level. Figures 1 and 2 show in linear and logarithmic scale, respectively, the critical values r c of the correlation coefficient, as functions of the number N of data points between 3 and 3 × 10 3 , for a series of significance levels α between 0.001 and 0.4.

Application to the ACE Magnetic Field Data
Here and in the following section we analyze 20 yr (1999-2018) of 1 s resolution magnetic field data from the MFI magnetometer (Smith et al. 1998) on the ACE spacecraft. Figure 1. Critical value r c of the correlation coefficient, for which P c (r c , N) = α, as a function of the number N of data points used to compute the correlations, for a series of significance levels α between 0.001 and 0.40. P c (r c , N) is the probability that a random sample of uncorrelated data points would yield a correlation coefficient r r c . 2 The next two paragraphs are our attempt at summarizing the practical statisticians' approach to correlation probabilities, essential to our analysis here in Section 2.2.4. They were written with the help of Chapter 7 of Bevington (1969), where a more detailed discussion can be found.
We also make use of the 64 s resolution SW flow data from the ACE SWEPAM instrument (McComas et al. 1998). To reduce memory use and speed up computations, we resampled the magnetic field data to a 3 s resolution. The ACE data were downloaded in RTN coordinates (in which R is along the radial direction, from the Sun to the spacecraft, pointing away from the Sun, T is the cross product of the Sunʼs rotation vector and R, and N ≡ R × T completes the triad). They were converted to xyz coordinates, in which z is along the local Parker spiral, pointing away from the Sun, x is the cross product of the Sunʼs rotation vector and z, and y ≡ z × x is identical to N.
Because we are using data from only one spacecraft (ACE), here we compute auto-correlation functions where B x is the fieldʼs mean value on the interval of length L (see also footnote 1). In the following, whenever we refer to these functions, "auto" is omitted. The measurements at one single spacecraft at 1 au allow one to estimate the fields' correlations most naturally along the nearly radial line of the SW flow past the spacecraft. Here in Section 2.2, we measure the fields' correlations and correlation time or length along the SW flow; parallel correlation refers to the correlations measured along SW flow lines, and parallel correlation length to the correlation time along these SW flow lines multiplied by the SW speed. In Section 3, we will estimate the correlation lengths along the Parker spiral with a simple translation of the time intervals in the SW flow past the spacecraft into local Parker spiral lengths, z V t cos SW y D = D , where ψ is the angle between the radial and Parker spiral directions. To test the consistency of the two in Section 4, we will translate both results for the correlation lengths back into times along the flow. Again, in this Section 2.2, parallel correlation naturally refers to the correlations measured along SW flow lines, and parallel correlation length to the correlation length along these SW flow lines.

Duration T of the Intervals of Analysis
When computing the correlation at a scale l of a data sample of duration T and length L = V SW T, perhaps the first question that arises concerns the choice of T. Clearly, demonstrating correlations on a scale l ? L would be challenging, so T has to be chosen to be long enough. Earlier studies based on the computation of the correlations (see references in the Introduction) have, as far as we know, used T ∼ 1 day or a fraction thereof. An interval duration T = 1 day roughly corresponds to a length L ∼ 3 × 10 12 cm or 0.2 au. Would this length be sufficient to demonstrate correlations on a scale on the order of 0.1 au? How would too short a length affect the shape of the correlation function? As a test, here we compute the correlations at a series of scales l on intervals of three different durations, T 1 day, using the B x component of the 3 s resolution ACE magnetic field data set. The main conclusion to be drawn from Figure 3 is that analyzing the data on series of time intervals [t n , t n + T] that are too short causes the correlations to drop prematurely. Whereas increasing T past 9 days has a relatively minor effect at higher l, the underestimate of the correlations computed from 1 day intervals can be striking beyond a fraction of 10 11 cm. This could be a major reason for the low estimates of the correlation length by previous authors. However, the length of the longer intervals (9 and 37 days) raises the possibility that larger-scale inhomogeneities, corotating interaction regions (CIRs), coronal mass ejections (CMEs), and other solar transient events could be playing a role in increasing the correlations on the longer scales. Here in Figure 4, we extend the analysis to a comparison of the correlation functions computed, for the year 2009 of extreme solar minimum, with a series of eight T durations between 1.75 and 224 hr (with successive T values in a ratio of two) plus T = 896 hr or 37.3 days. Figure 4 clearly confirms the dependence of the correlations on the duration T of the intervals of analysis, the  dependence being strong up to a couple of days and then dampening, as already observed past 9 days in Figure 3.
Our choice of the year 2009 of extreme solar minimum, with practically no CMEs or solar transient events, makes these events an unlikely explanation for the observed T-dependence. Still, we cannot preclude at this point that transitions from slow to fast wind may play a role. To preclude CIRs as well as CMEs and other solar transient events, we also considered a series of well-defined fast coronal-hole and slow SW intervals. A few can be found in the years 2008 and 2009 of low solar minimum that are at least 112 hr or 4.7 days long.  Figure 7 shows the correlations versus scale τ for a series of T durations on another interval of slow SW, observed between days 10.5 and 15 of 2009. The l-or τ-dependence of the correlations in the fast SW stream ( Figure 5) is most definitely not exponential, and greatly differs from that in the slow SW streams (Figures 6 and 7). Most importantly, the Tdependence is again clearly confirmed, and at this point, should be considered a very likely basic feature of SW magnetic field correlation functions from the observational point of view. Section 2.3 further investigates the issues of the T-dependence and of the (nonexponential) l-or τ-dependence with simple numerical simulations and from the theoretical point of view, with simple analytical estimates.

Sampling Rate and Correlations
When computing the correlation at scale l of a data sample of duration T and length L = V SW T, the question also arises as to how the time resolution of the data will affect the result of the correlation computation. Obviously, a higher resolution or higher sampling rate will give a higher number N of data points on the total length L of the interval; and we have seen that the statisticianʼs interpretation of the correlation depends on the number N. But when we choose a scale l to compute the correlation , does the result for r change with the number of intermediate points (between points separated by a distance l) included in the sample? If the data are well correlated on that scale, it should not. We check from the ACE data set that it does not. Figure 8 shows the results of correlation functions ( ) C l B x computations made by sampling the magnetic field data at a δt ≈ 3 s resolution (dotted lines), whatever the value of the ( ) C Bx t at (τ 1/e , 1/e). The T-dependence may be more moderate within well-defined fast SW streams, but remains strong at shorter T.   The results for individual 9 day long intervals show a similar match, up to a fraction of 10 12 cm. There is a lot of shape variation between individual intervals of SW, though, as illustrated by Figure 9. Figure 9 shows the correlations computed for each of the 36 successive 9 day long intervals of year 1999, the average of which is displayed as a black solid (dotted) line in Figures 3 and 8. In Figure 9, l is estimated with the T-averaged V SW . We see that there is quite a spread in the value of ( ) C l B x over time. For clarity of Figure 9, only the less noisy lines computed from the 3 s resolution data are shown.

Estimating the Correlation Lengths
With T 9 days, the year-averaged correlations ( ) C l B x computed from the 3 s resolution data are smooth enough down to ( ) C l 0.2 B x » or even 0.1 for a study of the correlations at the scales l 10 12 cm. Without year averaging, most (T = 9 days) correlations remain smooth enough down to somewhere between ( ) C l 0.2 B x = and 0.4, at scales l 10 11 cm to a few times 10 11 cm. To compute the correlation length from its integral definition, , the higher statistics of the longer intervals and higher resolution are really needed to narrow the error bars at the higher l (or lower If the drop of the correlation functions ( ) C l B x were exponential, there would be no need to worry about the uncertainties at higher l. One could just fit an exponential to ( ) C l B x from low l. The magnetic fluctuations observed in the SW are not monoscale, however, and even though, e.g., Wicks et al. (2010) concluded otherwise, the correlation functions do not drop exponentially. The extended power laws in the measured power spectra of magnetic fluctuations are the main reason/argument for this nonexponential drop (see the relation between power spectra and correlation functions in Section 2.3; see also Figure 4 of Ragot 2006c for an illustration of the self-similarity of a field with a power-law spectrum). On broader ranges of scales, exponential fits can be poor, particularly in the fast SW, which has relatively hard power spectra of magnetic fluctuations (see Figures 5-7, and 10-11 with higher statistics; see also Section 2.3). Until the actual correlation length is reached, the decrease of ( ) C l B x is closer to one minus a slow power law, with the main contribution to the integral possibly coming from the highest values of l λ c . Therefore, the main difficulty in estimating λ c from the correlation-function integral is to estimate the correlations at long enough l with a low enough uncertainty. When computing the λ c and their distributions from the correlation-function integral, we must use the higher (δt ≈ 3 s) resolution data to minimize the uncertainty. We must also include long enough l scales in our correlation analysis, that is, choose T long enough.   Figure 8, where the data are sampled at resolution l, but not averaged at that scale). The crosses indicate where the correlations were actually computed. Dotted lines: exponentials reaching 1/e at the same point l = λ 1/e as the solid lines (see λ 1/e histograms in Figure 12). Rising lines: critical values r c of the correlation coefficient as functions of the scale l in cm, for a series of significance levels α between 0.01 and 0.40. Where the correlation functions intersect the r c lines gives the length scales L α at which we know the data to be correlated at a significance level α (see histograms in Figure 13). In the statisticianʼs approach, the match of the yearly averages of the correlations in Figure 8, as well as of the correlations on individual intervals, justifies that we sample the data at the better resolution (3 s) to compute the correlation with a higher precision, but use the number N = L/l of data points to interpret the correlation. Using the greater N of a higher sampling rate would result in lower critical correlations r c (see Section 2.1). This would result in longer correlation-length estimates because at a given l, the conclusion of a negative no-correlation test would more easily be reached. By using the N values of the lowest possible sampling rate (V SW /l), we are ensuring that decorrelation is detected as soon as it occurs, at the lowest possible l. We compute the correlations with the better resolution, because good precision is absolutely needed at longer l for higher values of the significance level α. We show in Figures 10 and 11 the results of computations made with the 3 s resolution data, but in the statisticianʼs approach, interpret the results with N = L/l, as if the resolution were l.
If we sampled the data at the scale l from a higher-resolution data set, it would seem natural to also average our data on the scale l before sampling the data at that scale. So we do that too, and compare the results. Figures 10 and 11 show the correlations computed with and without preaveraging on the scale l, for T ∼ 9 and 37 days, respectively. Here, we find significant differences between the results with and without pre-averaging. We do not make any firm conclusion here as to which computation leads to the best estimate of the correlations (this will be addressed later in Section 4), but it is clear that the computation with preaveraging on the scale l would give the longest estimates for the "correlation length." The pre-averaging smooths out the field, making it more likely to measure higher correlation.
In addition to the correlations computed from the 3 s averaged (solid lines with crosses 3 ) and the l-averaged (dashed lines with crosses) magnetic field data, Figures 10 and 11 also show as dotted lines the exponentials that reach 1/e at the same l = λ 1/e as the 3 s averaged field correlations. From these figures it is clear that exponentials are very poor fits to the correlations. Still, the intersection points (λ 1/e , 1/e) provide a common and easy estimate of correlation length, with the clear advantage of not being affected by uncertainties at the highest scales l. We note also that the main departure from the exponential passing through (λ 1/e , 1/e) is often at the shorter scales l (see also Figure 5), meaning that the exponential scale λ 1/e is actually closer to the longer relevant scales than an attempt to fit ( ) C l B x at shorter l would yield. Figure 12 gives in dotted lines the distributions of the lengths λ 1/e for each of the interval durations T = 37 days (top panel), T = 9 days (middle panel), and T = 1 day (bottom panel), both for the 3 s averaged fields (black dotted histograms) and laveraged fields (red dotted histograms). The broad width of these distributions reflects the high variability of the correlation functions between individual intervals (see Figure 9). Here and throughout this paper, the histograms are computed from the complete series of intervals of duration T over the 20 years of our data set, not from the yearly averaged results. The distributions' widths may also be affected by a solar-cycle effect (see end of Section 4). Figure 12 also shows in solid lines the histograms of the correlation lengths λ c computed from their integral definition, again for each of the interval durations T ∼ 37 days (top panel), T ∼ 9 days (middle panel), and T ∼ 1 day (bottom panel), and both for the 3 s averaged fields (black histograms) and l-averaged fields (red histograms). Both λ c and λ 1/e distributions are roughly consistent, within a factor of 2. (This near consistency is likely due to the fact that much of the nonexponential behavior of the correlation functions occurs before they reach 1/e.) The lengths λ 1/e appear to often underestimate the correlation lengths λ c , by a factor on the order of 2 on average for T = 37 days, and by a smaller factor (≈1.5) for T = 9 days. For T = 1 day, the distributions are nearly identical. Also, the distributions shift down as the duration T of the SW intervals decreases, at a rate that appears to accelerate downward.

Length Scales L α for Correlation at Significance Level α
The rising lines in Figures 10 and 11 give the critical values r c (see Section 2.1) of the correlation coefficient as functions of the length scale l, for significance levels α = 0.01, 0.05, 0.10, 0.25, and 0.40, as indicated to the right of each r c line. Where the correlation functions intersect the r c lines gives the length scales L α at which we know the data to be correlated at a significance level α. Figure 13 shows the histograms of these length scales L α . The increasing uncertainties with larger l and lower r in the estimates of r (on individual intervals) cause the distribution of L α to broaden dramatically past α = 0.25 (especially for the l-averaged data, for which correlations must be computed with much lower statistics), so the result for L 0.4 is not shown in Figure 13.
The four panels of Figure 13 show the L α histograms for the 3 s averaged fields (top panels) and the l-averaged fields (bottom panels), for T ∼ 9 days (left panels) and for T ∼ 37 days (right panels). The statistics for the computation of the correlations are higher at 37 days, but accurate enough correlations must be estimated down to lower values of the correlations to find the intersection points with the r c lines (compare Figures 10 and 11, keeping in mind that the correlations shown in these figures are yearly averages. The results for individual intervals are much noisier at higher l-see Figure 9 for T = 9 days). So the results are not necessarily more accurate for T ∼ 37 days. Still, aside from the red and blue distributions of the lower right panel, which show significant fluctuations (presumably due to the uncertainties in determining the intersection points, but perhaps not-see below), all distributions in Figure 13 appear to be well defined, with only one major peak. We note that the two histograms (left bottom and top panels) for α = 0.01 and T = 9 days bear a strong resemblance to the black and red histograms of Figure 12ʼs middle panel. This we believe adds credence to the validity of the correlation lengths estimated through the correlation functions (provided that the duration T of the data intervals is long enough (> a few days)).
The resemblance is also strong for T = 37 days, with a shift of about 40%. However, it is the red dotted line histogram of the top panel of Figure 12 that best matches the l-averaged L 0.01 histogram of the bottom-right panel of Figure 13. (The red solid line histogram fails to drop by a few times 0.1 au.) On the one hand, this could be an example where λ 1/e yields a more reliable estimate of the correlation length than the integral of λ c , due to high uncertainties at the longer scales l. On the other hand, the tail in the red solid line distribution could be due to real correlations (see Figure 27 in Section 3.6, and Figures 32 and 33 in Section 4, which show a better match-weaker slope -for λ c than for λ 1/e ). If these correlations are due to markedly nonexponential behavior at longer l (as opposed to a global shift to higher l of the entire correlation function) for a subset of intervals, neither estimates of λ 1/e nor of L 0.01 would be able to reflect the existence of such correlations because λ 1/e and L 0.01 are obtained as the scales l where the correlations reach finite values 1/e and r c (0.01, L/l), respectively. Neither distributions of λ 1/e nor of L 0.01 should be expected to have a tail. But the series of L α distributions for higher α (decreasing r c (α, L/l)) may be able to reflect a nonexponential drop of the correlations at the longer scales l (extended correlations) for a subset of intervals, and to confirm the existence of a tail. They actually seem to (see red and blue histograms in the bottom-right panel of Figure 13, also extending past 1 au).
Again (see Section 2.1), there is no issue in the fields being correlated at a 0.01 significance level on the correlation length λ c or even up to 1.5 λ c . 4 The correlation length is the length scale over which the fields are strongly correlated but beyond a few of which the correlations drop. It should take several correlation lengths for the correlations to be reduced to a "negligible level." Even though the low statistics do not allow 4 A test run with a synthetic field of known correlation length leads to similar results, and shows correlations quickly dropping afterward. Here, from the year-averaged Figures 10 and 11, we estimate that it may take up to 6 or so correlation lengths for the significance level of the correlations to increase to 0.4 (about an equal level of confidence for some correlation as for no correlation at all). At the scales 6 λ c , we expect decorrelation to be effective and diffusion (see Section 3) to have started. us to verify with much confidence, from the individual data intervals, on what scale the correlation drop actually occurs, the year-averaged results indicate a drop to α = 0.4 by about 6 correlation lengths (see footnote 4). So we believe that the "statisticianʼs approach" does provide a useful "backup" method for a consistency check. Its results add credence to the results obtained by the "physicistʼs approach." It also clarifies the meaning of correlation length. Beyond correlationlength estimates, the length scales L α might also find some use in dealing with questions of fields' predictability or the predictive power of SW upstream measurements.

Simulations and Analytical Estimates of SW Magnetic Field Correlations
Here, we further investigate through simple numerical simulations and analytical estimates the dependences of the correlation functions on the separation scale τ and the duration T of the intervals of analysis. For clarity of the presentation, in this section, we consider the correlations as functions of the timescale τ, since the data are direct functions of time and the length scale l is only deduced from τ.
For a data interval of duration T, the correlation function, where σ is a window function of width T. This results from the very definition of the power spectrum as the Fourier transform of the correlation function (e.g., Jokipii & Parker 1969).
Assuming a piecewise power-law spectrum with, on each frequency interval [ν j , ν j+1 ], and a flat spectrum below ν 0 , the right-hand side of Equation (5) can be integrated to give, if T is very much longer than 0 1 n -, If T is not much longer than 0 1 n -, a heuristic way of accounting for the finite duration of the data interval and the resulting finite lower frequency of the projected spectrum is to introduce a lower cutoff frequency ν T = (ηT) −1 in the integral over ν, leading to n n p n n t pn t p n n t n n n n pn t n n t n n t = - where H is the Heaviside or step function. By comparing the results of numerical simulations and theoretical estimates, we find that η = 2 1/2 gives a reasonable match.
In order to verify the validity of these theoretical expressions, we now simulate the field A(t) from its projected power spectrum P A (ν). Any geometry could be assumed for the distribution of wavevector directions, provided that the projection of the 3D power onto the measurement or flow direction produces the projected (measured) spectrum P A (ν). But the easiest and computationally most economical geometry is the slab geometry, wherein all wavevectors are aligned in one direction, that of the flow. So we model the field A as with random phases, α n , and the discretization frequency, for a simulated field of total duration T simulation = 40 days, and intervals of data correlation analysis of duration T (<T simulation ). The conditions of Equation (11) guarantee that there are no periodicity effects in the simulated field. Using the above model, we compute the field A at a series of times t p = p δt, with the same resolution δt of 3 s as for the SW data. To improve the statistics, we compute one hundred independent realizations of the field A, with uncorrelated series of random phases α n . We then apply the same method of correlation analysis as for the SW in situ field, choosing a series of interval durations T between 1.75 and 224 hr, with successive T values in a ratio of two, plus 896 hr. In the example of Figure 14, we use for the projected spectrum the green spectrum (with a = −0.9999 instead of −1, to allow computation of the incomplete Gamma functions of Equation (8)) that will be calculated in Section 3.7 for the third 37 day interval of the year 2009 (see Figure 28 in Section 3.7). Figure 14 shows the results of our correlation analysis for the simulated field (dotted lines) and their theoretical, analytical approximations (dashed lines) with a lower cutoff frequency ( ) T 2 T 1 2 1 n = to account for the finite duration T of the data intervals in the correlation analysis. The thin black solid line is computed from Equations (7-8) without a lower ν T cutoff, and the thick solid line shows the approximation wherein an upper cutoff frequency, , 12 J 1 2 1 n p t = -is introduced in the integral ∫dν e i2 π ν τ P A (ν) of Equation (5), which is estimated as The correlation functions are clearly not exponential, and they strongly depend on the duration T of the data intervals. We also compare in Figure 15 the results of our simulations with those of our in situ data analysis. The match between in situ data and simulation results is quite good above separation scales τ of a few times 10 −2 hr. In both figures, the upper frequency max n in the simulations and analytical estimates is 4 × 10 −3 Hz, which, by producing a smoother field, causes the correlations to remain close to one over an extended range of timescales τ (up to τ ∼ 10 −2 hr). We checked that more realistic simulations with a more extended inertial range produce a more gradual increase of the correlations toward the lower timescales τ, with a close match at short separation scales τ, but they are much slower and therefore more difficult to iterate a hundred times (over the entire range of timescales τ) for good statistics.
In Figure 15, the in situ data and averaged simulation results show significant differences at longer T ( a day) and τ. We find that this is due to the lower statistics of the in situ data analysis at longer T, and the strong variability of the correlations with the field realization. The thin dashed-dotted lines show one example of a field realization in our simulation of one hundred such fields that gives a fairly good match to the in situ data analysis results at all T. (We did not look for the best matching realization; the first one, out of one hundred, was just good enough.) We also compare in Figures 16 and 17 the results of in situ data and simulations for the fast and slow SW intervals of Figures 5 and 6. For these shorter intervals, we used 10 max 1 n = -Hz and fewer fields' realizations (40 instead of 100). The match is reasonably good, but (unlike in Figure 15) did not benefit from knowledge gained in our study of Section 3 (see in particular Section 3.7) about the low-frequency power spectrum. From Figures 14-17, as well as Figures 3-7 in Section 2.2, it is clear that the correlations do very much depend on the n =´-Hz, which, by smoothing the fields, delays the decrease of the correlations estimated with shorter T. The power spectrum used for theory and simulation is the a = −1 spectrum of Figure 28 in Section 3.7.   duration T of the intervals of analysis, and a variation in the type of turbulence or the appearance of structures such as CMEs in the longer data intervals may not be blamed for all of this T-dependence. It is just the nature of extended piecewise power-law spectra to produce such a dependence on T.

CFDs and Diffusivity
The amount of correlation, or lack thereof, indicative of actual decorrelation of the fields is not known, as hinted by our discussion of Section 2 (Sections 2.1 and 2.2.4 in particular). But a straightforward way to assess whether decorrelation has occurred is to test for the diffusivity of the field lines. When decorrelation occurs on a scale L c , by definition of decorrelation, the successive field-line cross-field displacements (CFDs) (δr) n on the background field aligned scales (δz) n = l L c are independent. As a result, the spreading over where the correlations ( ) ( ) r r n n d d á ñ ¢ between different steps have been neglected. It increases linearly with the distance Δz elapsed along the background field direction. It is said to be diffusive. The MFL cross-field diffusion starts at a few correlation lengths of the fields (see further below, around Equation (32), for a refining of "a few"). On the shorter scales Δz, the fields' correlations cause supradiffusion of the MFLs, 〈(Δr) 2 〉 ∝ (Δz) α with α > 1 (Ragot 1999(Ragot , 2001(Ragot , 2006c(Ragot , 2006d(Ragot , 2018. So, an alternate (substitute) approach for evaluating the fields' correlation length uses the MFL CFDs instead of the fields' correlation function. It consists in determining the length beyond which the MFLs become diffusive, that is, have a spreading 〈(Δr) 2 〉 that increases linearly with Δz. Here, we use the ACE data set to test for the MFL diffusivity on a broad range of length scales and determine the correlation length L c . But some more theoretical background will guide us in our data analysis, by helping us determine which quantity to consider to best determine L c .

Some Theoretical Background
In the quasilinear calculation for the MFL wandering across a background field B 0 , the magnetic field perturbation δB is assumed small enough that Δr, the MFL perpendicular displacement across B 0 or CFD can be neglected in the turbulent field (Jokipii & Parker 1968, 1969. The transverse turbulent field is evaluated along a line that follows the background field direction instead of a real MFL. This approximation is known as the quasilinear approximation. The same quasilinear approximation is made in the generalized quasilinear (GQL) calculations that extend the quasilinear calculations to all field aligned length scales Δz (Ragot 1999(Ragot , 2001(Ragot , 2006c(Ragot , 2006e, 2018. Unlike in the original quasilinear calculation, however, the GQL calculations do not assume a length scale Δz much longer than a parallel correlation length  L c . The GQL calculations establish that  L k c 0 1 º -, where k 0 is the parallel wavenumber below which the projected turbulence spectrum P ⊥ (k ∥ ) is flat. In the quasilinear or GQL approximation, diffusion occurs beyondk 0 1 -.
In the quasilinear regime of turbulence, where the quasilinear approximation is justified (by definition), the mean square CFD, is shown to be the triple integral (see Equation (7) in Ragot 1999or Ragot 2006c, f the background field projected power spectra, ith relatively simple analytical expressions for piecewise power-law spectra (see Ragot 1999Ragot , 2006cRagot , 2018. In Equation (18),b x andb y are the Fourier transforms (with sliding windows) of the rescaled x-and y-components of the magnetic field B (i.e., b = B/B 0 where B 0 is the intensity of the background field) and the wavenumber k m is related to the definition of the Fourier transformb with a window of finite width or, if phase correlations exist in the spectrum, to the phase-correlation scale (see Ragot 2006c).
For our purpose here of determining the correlation length, we only consider the limits of the very short and very long scales Δz. In the limit of the very short scales, such that k M Δz = 1, the integral expression of Equation (19) behaves as x y 2 2 0 0 ò º + + which is much easier to evaluate numerically than power spectra and their integral. (The spectral integrals and I b are related through the Parceval equality.) At the long scales Δz > 1/k 0 , where k 0 is defined as the parallel wavenumber below which the spectrum becomes flat, we have (Ragot 1999) which in the limit k 0 Δz ? 1 increases linearly with Δz,

Application to Data Analysis
In terms of I b (see Equation (21)), we can write nd it is therefore convenient to consider the quantity ith the two simple asymptotes: s the relative magnetic power at |k ∥ | < k 0 . More generally, between the asymptotes but for long Δz > 1/k 0 , in the quasilinear regime of turbulence, we can rewrite Equation (24) for A ⊥ as Figure 18 shows A ⊥ over 2 β Δz computed from Equation (31), as a function of k 0 Δz between 0.1 and 30 in log−log scale. In the quasilinear regime, the MFL decorrelation is initiated at the parallel correlation scale  L k 1 c 0 = (Ragot 1999(Ragot , 2006c(Ragot , 2018. This is the scale where Figure 18 shows the function A ⊥ /Δz starting to head downward (indicated by black vertical line). The actual MFL diffusion (asymptote of Equation (29)) requires a few such correlation scales to start. It is effectively started by the scale L 2π ≡ 2π/k 0 of the first minimum in Figure 19. The red vertical line indicates the position k 0 Δz = 2π of that first minimum of the slope computed in log−log space.
When analyzing the A ⊥ computed from SW data, because of the rapidly growing error bars at longer Δz scales, 6 related to the lower statistics at the longer Δz scales, it is more practical to determine the scale where a given value s lim of the log−log slope of A ⊥ versus Δz is first reached. Assuming that the interval of the SW data analyzed is indeed in the quasilinear regime of turbulence, we define the length scale  Figure 19 the value of s s lim = as a function of γ. The slope s becomes less than s 0.1 lim = at approximately L 5 = 5/k 0 . Still assuming quasilinearity, if for a given data interval we can determine the value of L γ for a given s lim , using the relation ( ) s lim g g = of Figure 19 we can deduce the value of the wavenumber k 0 on that interval, and obtain under the assumption of quasilinearity, the parallel correlation length, Before concluding that these length scales are the actual correlation and diffusion lengths, respectively, we still need to determine whether we are in the quasilinear or nonlinear regime of turbulence, and that SW cross flow is not the cause of the decorrelation. In all cases, diffusion starts on the observed scale, so the numerical value of the diffusion length is not affected, and the inferred value of the correlation length is probably close to the real correlation length (most definitely of the right order of magnitude). However, the interpretation of  this length very much depends on the cause of the decorrelation.

Nonlinear versus Quasilinear: a Practical Test
A fully nonlinear statistical calculation of the mean CFD (Ragot 2006b) establishes the range of validity for the quasilinear approximation. It predicts a nonlinear diffusion of the MFLs, distinct from and slower than the quasilinear or GQL diffusion, whenever the mean CFD 〈(Δr) 2 〉 1/2 exceeds twice the perpendicular correlation length L c^a nd the nonlinear length scale ζ 0 defined below in Equation (35) is shorter than the parallel correlation length  L c . When ζ 0 is not shorter than  L c , that is, at lower levels of turbulence, the quasilinear or GQL diffusion sets in first and both GQL and nonlinear diffusion coefficients are identical. In both cases, the diffusion is preceded by a supradiffusion, that is, a transport regime wherein 〈(Δr) 2 〉 increases faster than Δz. In the nonlinear regime, decorrelation and diffusion are driven by the MFL CFDs. For nonsingular spectra, 7 Ragot (2006b) showed that, whenever the nonlinear length scale ζ 0 , defined by (for a series of wavenumber intervals [k p , k p+1 ]), the components of the nonlinear spectral function, P x,nl,p , are given by Equation (A8) of Appendix A in Ragot (2010) and the GQL mean square CFD 2 s z at ζ < ζ 0 by Equation (A13) of that same Appendix. 8 Because ζ is less than ζ 0 within the integral of Equation (37), 2 s z can be accurately estimated using the GQL result of Ragot (2006cRagot ( or 1999. Both Equations (A8) and (A13) are closed analytical expressions. So the GQL and nonlinear mean CFDs and diffusion coefficients can be estimated from the turbulence spectra. They closely match, in Figures 5 and 6 of Ragot (2006b), the results of GQL and fully nonlinear numerical simulations of MFLs.
In terms of the parameters defined in this paper, accessible through simple data analysis, the nonlinear regime is reached if there exists a length scale Again, β, the relative magnetic power below k 0 , is defined in Equation (30), and the anisotropy parameter ξ in Equation (38). For each of the ACE data intervals analyzed for the year 1999, we plot h(ζ) as function of ζ in Figure 20 and indicate with crossed diamonds (and crosses) of matching colors the position of L 5 (and  L c ). From the definitions of A ⊥∞ and L γ (see Equations (29) and (32)) follows that, in the quasilinear regime, can be obtained from the data analysis. The ratio ξ/β is shown as spiders in Figure 20 for ξ = 1 (and as thinner spiders for ξ = 1/3). We can see from Figure 20 that unless ξ is small (<1/3), h(ζ) remains less than ξ/β up to  L c (indicated by the crosses, and dotted vertical lines) and the quasilinear condition is satisfied. It is likely that ξ < 1, on the order of 1/2, 1/3, or slightly less. One-third would be a plausible value. So we note that although we believe here that the nonlinear regime with its early diffusion is not reached, it is very close (as would be expected from Figures 5-6 of Ragot 2006b). Figure 21 shows for all 37 day intervals from 1999 to 2018, the product β h(ζ 0 ) when  (40), which must be compared to ξ/β to determine the nonlinearity of the turbulence regime, vs. ζ, for the 9 37 day long intervals of 1999. Diamonds with crosses: points where the slope s reaches 0.1. Crosses: points at γ shorter scale, where decorrelation starts. Spiders: same abscissa  L c as crosses, but at ordinate ξ/β instead of ( )  h L c . The turbulence is in the quasilinear regime if the spiders are above the lines and crosses of matching color, in the nonlinear regime otherwise. If the nonlinear regime is reached, the nonlinear ζ 0 substitutes for the quasilinear length  L k 1 c QL 0 = as the correlation length  L c . 7 Assuming CFD diffusion and Gaussian statistics, Matthaeus et al. (1995) and Ruffolo et al. (2006) estimate diffusion coefficients in the limit of a turbulence made of two-dimensional and slab fluctuations. Such a turbulence would produce singular projected spectra (at least some of the time). Here, we do not make assumptions about the geometry of the wavevector distribution, only about the nonsingularity of the projected spectra (measured at one spacecraft). Also, we do not assume diffusion; we look for the scale at which it occurs, if/when it occurs. 8 The spectral indexes −q p in the three-dimensional spectra are different from the spectral indexes a j of the projected spectra. For an example showing the relation between the two, see Figure 2 of Ragot (2006a).
ξ that would put the turbulence right at the limit of the nonlinear regime. Shorter  L c values tend to imply an easier condition with larger ξ values, which is not really surprising since If ξ were much smaller than 1 most of the time, then the nonlinear regime would be reached most of the time and the nonlinearity would be strong. So we consider here what the consequences would be for our analysis. Well within the nonlinear regime, the expressions for the measured parameters A ⊥∞ and L γ should be rescaled by C 1/2 and C −1/2 , respectively, with C > 1, the turbulence level relative to that at which the nonlinear regime starts (see Ragot 2006b, below Figure 5). The new nonlinear scalings, We would therefore be deducing from the measured parameters smaller β values (by a factor C −1 ), making the condition (39) for nonlinearity harder to reach. So if the values of ξ are indeed smaller than we believe they are (smaller than 1/3), the nonlinear regime should still be relatively "mild" or marginal, with only relatively small corrections to quasilinearity. The MFL decorrelation would be due to nonlinear CFDs within the SW rather than, e.g., strong temporal variations of the magnetic fields emerging from the lower solar corona, but on a timescale only marginally shorter.

MFL Wandering versus SW Cross Flow
Another possible interpretation of the observed diffusivity and inferred correlation length  L c is that the observed MFL decorrelation is due to the SW cross flow. On the Parker spiral or background field aligned scale, z V t SW D = D cos y, 9 the SW flows past the spacecraft a distance  . Diamonds with crosses: where diffusion is estimated to start (slope s = 0.1). Crosses: where decorrelation is estimated to start (at L γ /γ, with γ = 5). Spiders: values of A VSW at the start of decorrelation. If the spiders are well above the solid lines of matching color, cross flow is responsible for the decorrelation.  Figure 24 shows the ratios A A V SW^f or all 37 day intervals of our analysis. It is clearly 1 in the vast majority of the cases, though again, for the largest values of  L c , most values of the ratio exceed one. It seems that in those few cases when MFL wandering fails to produce a decorrelation, the SW cross flow takes over and forces the decorrelation. , and a waving vertical line indicating the scale beyond which error bars become significant, at least after decorrelation has occurred. To increase the statistics in the computation of A ⊥ , the subintervals of scale Δz were taken at increments Δz/16.
The year averages of A ⊥ versus Δz are shown in Figure 26 for the 20 years of our data analysis. Most give an ( ) L s 0.1 lim = g less than about 2 × 10 13 cm, which is where the uncertainties cause A ⊥ to start wavering around, in the cases where decorrelation has occurred. When strong correlations subsist, the uncertainties are less. This is why we do not dismiss the few cases where A ⊥ continues increasing well past Δz = 2 × 10 13 cm. Figure 27 shows the distribution of correlation lengths obtained through the CFD method of this section for all successive 37 day intervals of the years 1999-2018. We note that unlike with the correlation-function methods, we obtain just one distribution of correlation lengths  L c , computed with intervals of duration T = 37 days. No diffusion could be observed with enough accuracy with shorter intervals (of duration T = δt × a power of 2, to allow for a fast computation of power spectra as well), and there is little ambiguity about the definition of the correlation length and the way to obtain it. Analysis on shorter time intervals, in most cases, would only provide lower limits for the correlation and diffusion lengths (as was the case in our earlier analysis of Ragot 2006e-see Table 1 of that paper).
The values of the correlation length  L c range from about 0.037 to 0.3 au and, with lower probability, extend past 1 au. The distribution peaks at 0.075 au, with a secondary peak at 0.15 au. The red dotted, green dotted, and blue dotted histograms in Figure 27 show the contributions from the nonlinear regime, the quasilinear regime, and the regime dominated by the SW cross flow, respectively. In the nonlinear regime of turbulence, the computed  L c is the ζ 0 < 1/k 0 of Equation (35). It tends to be shorter because the nonlinearity produced by strong MFL wandering causes an early decorrelation. The limit between nonlinear and quasilinear regimes was computed in Figure 27 with ξ = 0.25. For ξ = 0.25, we obtain similar numbers of SW intervals in the nonlinear and the quasilinear regime of turbulence. For ξ = 0.33, most SW intervals would be in the quasilinear regime, while for ξ = 0.20, most would be in the nonlinear regime. So the distributions of nonlinear versus quasilinear regimes is quite sensitive to the value of ξ. In the "cross-flow dominated" regime, MFL wandering fails to produce a decorrelation. Eventually, the SW cross flow at the spacecraft causes that decorrelation of the fields, but on a longer length scale, hence the tail of measured  L c distribution. The range of values of the parallel correlation length  L c in Figure 27 is consistent with the T = 37 days correlation analysis of Section 2 (see the right panels of Figure 13 and top panel of Figure 12), the best consistency being with the T = 37 days analysis of the l-averaged data (see red histograms in upper panel of Figure 12). But we defer until Section 4 the comparison and consistency check of the correlation lengths obtained here in Section 3 through the CFD diffusivity test and earlier in Section 2 through the use of correlation functions.
To conclude on the use of CFDs to evaluate a correlation length  L c , in the cases when MFL wandering dominates the cross flow, whether the observed diffusion is quasilinear or nonlinear, the MFL diffusion and correlation lengths are, in most cases with a high degree of confidence, the ones that can be deduced, with very little need for interpretation, from Figures 22, 23, or more generally, Figure 25. The measurements with automated computing of the scales L γ and A ⊥∞ can in some cases give excessive values, but visual inspection of the figures for mean CFDs should allow one to correct and/or properly interpret these results. For instance, in the Year 2007 panel of Figure 25, the "runaway," straight green line and diamond at 10 14 cm indicate a failure of MFL wandering to produce decorrelation and diffusion; this is clearly a cross-flow dominated interval. It could be worth checking for the presence of CMEs, but we find that such structures are not necessary to produce this type of CFD behavior.
In general, we find that using the CFD diffusivity test of this section to evaluate diffusion and correlation lengths in the SW is more straightforward than the methods based on correlation functions (in part because it is easier to know when convergence/diffusion has been reached). The results are easier to interpret and well suited for application to transport problems. The obvious drawback is the length of the intervals needed to observe the diffusion (five times the correlation length  L c , which is already long), and therefore, the lack of locality of the evaluated length scale.

Inferred Spectral Power at Low Frequencies
Finally, what can we learn from our data analysis about the spectral power at low frequencies? And could that information have been obtained through direct spectral analysis of the fields?   L c , in au, for the ∼180 37 day long intervals of the entire data set. The red dotted, green dotted and blue dotted histograms show the contributions from the nonlinear regime, the quasilinear regime, and the regime dominated by the SW cross flow, respectively. The limit between nonlinear and quasilinear regimes was computed with ξ = 0.25.
From the data analysis of the magnetic fields we obtain for each SW interval the length L γ , where the mean CFD/Δz (or A ⊥ ) slope versus scale Δz, in log−log space, first reaches s 0.1 lim = , and the value of the asymptote A ⊥∞ . We also compute the mean value I b of the cross-field magnetic energy relative to the background magnetic energy (see Equation (21)). This is also the total spectral power ( )   P k 2 k 0 ò >^i n the b x and b y fields (Parceval equality). The spectral power at "zero," 2 P ⊥ (k ∥ → 0), of these fields is the one that was needed in the early quasilinear theory approach to obtain the correlation length (Jokipii & Coleman 1968;Jokipii & Parker 1969).
From the definition of β (the relative magnetic power at |k ∥ | < k 0 -see Equation (30)), we have (44) and (45), it leads to where C (1), the turbulence level relative to that at which the nonlinear regime starts, is unknown but, we believe, can safely be assumed of order one (see Section 3.4). If the nonlinear regime does not start, that is, in the quasilinear regime, C = 1. All other quantities are known quantities of the data analysis.
To see how the data analysis can further constrain the power spectrum, let us assume that ( ) ( )( )   P k P k k k a 0 0 =ŵ ith −1 a < 0 up to a parallel wavenumber k 1 and is steeper than  k 1 above k 1 , with a spectral index a 1 < - Now, let us compare these inferred P ⊥ (k ∥ ) spectra to the spectra that can be computed directly by Fourier analysis of the ACE magnetic field data (see, as an example, frequency spectra of Figure 28). The spectra P obs computed directly by Fourier analysis of the in situ data are not function of k ∥ , the wavenumber along the background field or spiral direction, but of k R , the wavenumber along the radial direction. They are projected spectra, but on a direction that is at an angle ψ ∼ 45°( on long time intervals in the ecliptic) to the one needed to estimate the mean CFDs. 10 Furthermore, the total duration T of the SW intervals needed to statistically observe the onset of CFD diffusion is so long that the fields will clearly have decorrelated, several times over, due to the spanning of magnetic fields emerged from several independent supergranulation cells back at the Sun. The effect should be a flattening of P obs already at frequencies of several times 1/T (see Figure 28). That flattening is a cross-flow effect (see also discussions in Ragot 2006d. In our data analysis, the diffusion onset is usually observed at or before (within a factor of a few) one tenth the total length spanned by a given SW interval, 11 and as argued in Section 3.3, is due to decorrelation starting at a scale that is another 1/γ times shorter. So typically, the correlation times are τ c T/50 18 hr. As we carefully checked in Section 3.5 (see Figures 22-24), the MFL mean CFDs at the correlation times exceed the cross flow, except in the very few cases in which the decorrelation due to MFL wandering fails to occur and cross-flow decorrelation substitutes. The decorrelation due to MFL wandering, when it occurs, is due to spectral flattening of P ⊥ (k ∥ ), at frequencies ( ) V k cos 2 0 S W 0 n y p = that are too . Blue, red, and green lines: inferred spectra above ν 0 for the assumed spectral indexes a = 0, −0.5 and −1, respectively, and a 1.5 ¢ = -. The plus sign marks the position of (ν 0 , P ⊥ (ν 0 )). low for spectral analysis on intervals of duration τ c (or even 5 τ c ) to reach. Using the entire interval of duration T to compute a spectrum P obs and compare the result with the spectra that can be deduced from the CFD analysis is misleading because a proper comparative study of the spectra should compute P obs from intervals of durations T/10, or even T/50, and average the results. But then the flattening frequency would not be reached, and there would not be much to compare.
To conclude on this point, k 0 and P ⊥ (k 0 ) cannot be computed by spectral analysis of the data. Any spectral analysis of SW data intervals with sufficient duration to reach the relevant low frequencies is doomed by transverse decorrelation, due to the rotation of the Sun. This explains why the correlation length ∼10 −2 au obtained early on from the direct computation of the power at "zero" frequency (Jokipii & Coleman 1968;Jokipii & Parker 1969) is a serious underestimate of the parallel correlation length  L c .

Reconciling the Results
We have so far more or less considered the two approaches for estimating the correlation lengths separately, focusing in Section 2 on the correlation functions approach and in Section 3 on the CFD diffusivity test. The "third" method involving the spectral power at "zero" frequency (Jokipii & Parker 1968, 1969 is addressed in Section 3 for the obvious reason that the CFD diffusivity test is based on a theory that generalizes the original work by Jokipii and Parker, and both theories relate the mean CFDs to the projected power spectra (even if in this paper, we have worked our way around the power spectra to estimate the correlation length in a more reliable way, precisely due to the difficulty of evaluating the power at "zero" frequency). The spectral power at "zero" frequency method and the CFD diffusivity-test method both deal with CFDs and diffusivity of MFLs. They estimate the correlation length of the MFLs, which relate to the magnetic field integrated over a path. By contrast, the correlation functions approach is generally applied to the magnetic field itself, not its integral along a path.
We could argue here that energetic charged particles propagating in the SW feel the cumulative effect of the SW magnetic fields along their path, that their path to first order follows the MFLs, and that therefore the MFL correlation length is most relevant to the transport of energetic charged particles, whereas the usefulness of the correlation length of the "instantaneous" field in helping understand/describe the transport of these particles is somewhat more removed. But here our focus is on reconciling the results of both methods, the method based on the correlation functions and the method based on the CFD diffusivity test (the issue with the spectral power at "zero" frequency having been solved in Section 3.7). So we just do that below, reconcile the results of both methods, with the understanding that the results of the two methods generally are the correlation lengths or times of two different quantities.
In Section 2.3, we expressed in Equation (5) the correlation function C A (τ) of the field A as a ratio of integrals, over frequency, of the power spectrum P A (ν). From this expression of Equation (5), we can estimate the correlation time τ c,A of a field A in the limit of an infinite duration T as follows:  in the nearly radially flowing SW. The field lines are wandering across the background magnetic field B 0 in a two-dimensional space, the (x, y) plane. In order to check the consistency of the results of both methods, we need to reduce the wandering of the field lines to just one dimension, along the x-line. This can be very easily done by substituting the spectrum P b x and fieldʼs Fourier transformb x for P ⊥ andb^, and repeating the exact same computations to obtain A x , A x∞ , and L γ in place of A ⊥ , A ⊥∞ , and L γ (see Figure 25). We are now using the same fields and projected power spectra in both methods, within the factor B 0 2 , which cancels due to the ratio in Equation (56).
Since ν 0 is the frequency below which the projected spectrum is flat, where β = γA x∞ /(πL γ C) (see Equation (45), with γ = 5 for a start of the diffusion estimated at a log−log slope s 0.1 lim = ) is the relative magnetic power at |ν| < ν 0 or |k ∥ | < k 0 (see also Equation (30)), and C (³1) is the turbulence level relative to that at which the nonlinear regime starts. C is unknown but again, we believe, can safely be assumed to be on the order of one (see Section 3.4). Expressed as a function of "known" quantities, The values of ( ) 2 cos 2 pb y range from ≈0.05 to 1. Their distribution is shown in Figure 29. Figure 30 shows the ratio c,B c Figure 29. PDF of the ratio ( ) 2 cos 2 pb y , where β is the relative magnetic power at ν < ν 0 (where the spectrum becomes flat). When multiplied by C 1/2 , ( ) 2 cos 2 pb y gives the theoretical ratio c,B c x M F L t t in the limit of the very long times T (see Equation (59)).  (see also footnote 4 in Section 2.2.4). So, the correlation timescale is a timescale over which correlations are very strong, but beyond a few of which the correlations sharply drop. Figure 34 shows the histograms of the 1D MFL correlation time c MFL t (black solid line) and of c MFL t times ( ) 2 cos 2 pb y (black dotted line), that is, of the fieldʼs correlation time c,B x t predicted from the CFD diffusivity-test method for C = 1, approximately matching the histograms of c,B x t over ( ) 2 cos 2 pb y ; (the 1D MFL correlation time c MFL t predicted through the correlation-function method for C = 1; red solid line) and of c,B x t (red dotted line), respectively. Figure 35 shows a similar match for the histograms of the correlation lengths, instead of the correlation times. Figures 34 and 35 demonstrate consistency between the results of the two methods, provided that the correlations are computed on data intervals long enough for convergence (here, T ≈ 37 days). The match is actually surprisingly good given the large uncertainties of the correlations and mean CFDs at the long scales.
As a reminder of the critical effect of the interval duration T (see Figures 3-7, 14-17 and Sections 2.2.1 and 2.3), Figure 36 presents here the c,B x t histograms computed for our entire data set and a series of interval durations T between 1.75 hr and     Figure 37 shows the median values (black), and the 10th, 20th, 80th, and 90th percentiles (blue and green) as functions of T. At T ∼ 2 hr, the correlation times are underestimated by over an order of magnitude. So yes, the methods are consistent, but one still has to make sure that the correlation-function method has converged. That said, the correlation-function method may be applied to data intervals that are a few times shorter than for the CFD diffusivity-test method (keeping in mind that the results may be slight underestimates), which can be very useful. For instance, Figure 38 is obtained by applying the correlation-function method with T = 28 hr. It shows the average dependence of the (T = 28 hr) fields' correlation time on the SW speed. Slower SW has longer average correlation times, by as much as a factor of two (consistent with Figures 5-7, 16, and 17).
This V SW -dependence may explain some of the yearly variations in the (T = 224 hr) correlation times, with a clear drop in the year 2003 (by a factor of ≈1.7 from the year 2002, to a value of ≈1.9 hr), when months of high-speed SW streams severely affected the yearly distribution of wind speeds (compare the blue dotted line with the red solid line in Figure 38). In addition to the V SW effect, the variations in the correlation times may also correlate with the solar cycle (see Figure 39). It is apparent for T = 224 hr (red dots) but not T = 28 hr (orange dots). Due to lower statistics (only nine intervals per year), the solar-cycle effect is also much less clear for T = 37 days (not shown), even though the V SW effect is still strong. The variations with solar cycle (by a factor of ∼2; the 10th, 20th, 80th, and 90th percentiles fluctuate by a similar factor) observed for long T may contribute to the width of the distributions in Figures 12 and 13 (though not in the lower panel of Figure 12). The observation of a solar-cycle effect in the field-component correlation time computed with T = 224 hr ≈ 9.3 days, but not (or hardly) for T = 28 hr, is consistent with previous results by Wicks et al. (2010) and Engelbrecht & Wolmarans (2020).

Conclusion
The estimate of the SW magnetic fields' parallel correlation length, be it from the correlation functions of the measured    . Black solid line: international sunspot number (monthly, smoothed on 13 months, from www.sidc.be/silso). Blue diamonds joined by solid lines: American sunspot number (yearly, from the NOAA website; see, e.g., Clette et al. 2016;Balasubramaniam & Henry 2016 for SN consistency analysis; ISN may be overestimated by 30% around solar maximum). Red dots joined by solid lines: yearly medians of the fields' correlation times, c,Bx t , computed for T = 224 hr (plotted on the red right-hand-side y-axis). Orange dots joined by solid lines: same for T = 28 hr. Green squares joined by red dotted lines: yearly relative numbers of slow SW (V SW < 450 km s −1 ) 28 hr intervals (rescaled by a factor of 5 and plotted on the red right-hand-side y-axis). The red-dot median correlation times may show both solar-cycle and V SW dependences. fields or from the fields' spectral power at "zero" frequency, have long pointed toward short values on the order of 0.01 au. However, evaluation of the MFL spreading or mean CFDs failed to show the decorrelation and resulting diffusion at the "expected" length scales, pointing instead toward values for the parallel correlation length along the Parker spiral on the order of 0.1 au or more. In an effort to understand this order-ofmagnitude discrepancy and reconcile the approaches using correlation functions and diffusivity test of the mean CFDs, we have applied here both approaches, with renewed attention to the "details" as well as the broader sense of the calculations, to a large (20 years long) set of magnetic field and flow data from the ACE spacecraft.
Our first finding, and possibly the most impactful in solving/ reducing the discrepancy, is that the correlation functions can be severely affected by the length L of the SW intervals chosen to compute them, if L does not exceed the correlation length λ by a factor large enough (see Figures 3-7, 14-17, 36, and 37). It appears that one day, the longest interval duration T = L/V SW of the correlation studies cited in the Introduction, causes the correlations to drop prematurely. It is too short to properly compute correlation lengths on the order of 0.1 au in the SW (Section 2.2.1). For T ∼ 2 hr, the computed correlation lengths can be underestimated by over an order of magnitude (Figures 4,6,7,14,15,17,36,and 37). These findings are confirmed by simulations and analytical estimates of the correlations (Section 2.3; Figures 14-17).
The correlation length of a field B x is classically defined by the integral, over the length scales l. It is customary among physicists to expect correlations to decrease exponentially with the scale l and to seek the characteristic length scale λ of that decrease, either by fitting an exponential to the correlation function, or by searching for the scale λ 1/e where the correlations reach 1/e, instead of integrating the correlations over l to obtain λ c . If the correlations fit e l/ λ , then λ c = λ = λ 1/e . In the SW, however, the correlations do not drop exponentially (see Figures 5-7, 10, 11). So λ 1/e could be a poor substitute for λ c . Because much of the nonexponential behavior of the correlation functions occurs before they reach 1/e, however, the length λ 1/e appears to be a fairly good substitute for λ c , most of the time (Figure 12). In very low statistics, we suggest that it may sometimes even produce better results than the integral λ c , which could be affected by the large uncertainty of the values of ( ) C l B x at larger l. However, large-scale nonexponential correlations could also affect the values of λ 1/e . This is supported by the greater slopes in Figures 31 and 33 relative to 30 and 32, respectively. For high enough resolution data (no scale averaging), the issues for the integral λ c are resolved by increasing T.
Beyond the choice of interval duration and the difficulty in obtaining sufficient statistics for the correlations at high l to accurately estimate (or simply capture the main contribution to) the integral λ c , we also noted (and explored for a backup method or consistency check) the statisticianʼs different approach to the use of correlations. Recalling in Section 2.1 the statisticianʼs basic principle that a correlation coefficient r cannot be used directly to indicate the degree of correlation, but that the interpretation of r critically depends on the number N of data points or observations used to compute r, we proceed in Section 2.2.4 to compute the length scales L α at which the fields are known to be correlated at a significance level α. The length scales L α are given in Figures 10 and 11 by the scales where the correlation functions ( ) C l B x intersect the critical values r c (α, N = L/l) (see Figures 1 and 2) of the correlation coefficient. Their distributions are presented in Figure 13 for interval durations of 9 and 37 days, with and without preaveraging the magnetic fields on the scale l. The results for L α emphasize how strong the correlations are on the correlation length or timescale, and even up to one-and-a-half times or twice that scale. They emphasize that the correlation timescale is the timescale over which correlations are very strong, but beyond a few of which the correlations sharply drop. They otherwise mostly confirm/support our other results, with mostly the same drawback as λ 1/e when limited to one small value of α.
In Section 3, we turned our attention to the other approach, which consists of computing the mean CFDs 〈(Δx) 2 + (Δy) 2 〉 1/2 of the SW MFLs on intervals of duration T as functions of the background field aligned length scale Δz, and testing for the diffusivity of these CFDs to find, first, the diffusion scale, and then the correlation scale,  L c , which is a fraction of the diffusion scale. Taking advantage of the large ACE data set, we are able to choose T long enough (∼37 days) to actually observe the diffusion start and compute large statistics of diffusion and correlation lengths. Using the generalized quasilinear theory in a simplified form that focuses on the very short and long scales (Section 3.2), we identified in Section 3.3 the rescaled quantity A ⊥ ≡ 〈(Δx) 2 + (Δy) 2 〉/(I b Δz) as optimum to obtain the diffusion/correlation length from the in situ fields' measurements, with the simple asymptotes Δz and  L c QL pb of Equations (28) and (29) at the very short and the very long scales, where β is the relative magnetic power at |k ∥ | < k 0 , the parallel wavenumber below which the power spectrum is flat, and I b is the total magnetic energy relative to background. By using the relation between the logarithmic slope s of A ⊥ versus Δz and the ratio γ of the length scale L γ to  L c in the quasilinear regime (see Figure 19), we are able to make accurate determinations of  L c by first determining L γ for s = 0.1 and then  L L c g = g with γ = γ(s = 0.1) ≈ 5. We further translated the test for the nonlinearity of the turbulence regime (Ragot 2006b) in terms of the new quantity A ⊥ (see Equations (39)-(40)), and applied the test to the ACE data set (see Figures 20 and 21, Section 3.4). We find that if the ratio  L L c c x º^of the perpendicular to the parallel correlation length is small enough (typically <1/3), the regime is nonlinear rather than quasilinear. This changes the interpretation of the measured lengths. The diffusion is nonlinear rather than quasilinear (with a lower diffusion coefficient), and the measured correlation length is ζ 0 < 1/k 0 rather than  L k 1 c QL 0 = . However, by introducing the nonlinear scalings for the nonlinear correlation scale ζ 0 and diffusion coefficient A ⊥∞ , we find that if the nonlinear regime is reached, it should still be relatively marginal, with only small corrections to quasilinearity. The MFL decorrelation would be due to the nonlinear CFDs within the SW rather than, e.g., strong temporal variations of the magnetic fields emerging from the lower solar corona, but on a timescale only marginally shorter.
We also compared the effects of MFL wandering and SW cross flow in Section 3.5. We find that MFL wandering usually dominates or, at the very least, contributes substantially to the mean CFDs (see Figures 22-24). There are a few cases, however, where MFL wandering fails to produce decorrelation and SW cross flow forces the decorrelation at much longer  L c , accounting for the high tail of the  L c distribution in Figure 27.
Our complete data analysis for the correlation length  L c , over 20 yr of the ACE data set, shows values for  L c ranging from about 0.037 to 0.3 au and, with much lower probability, extending past 1 au (see Figure 27, Section 3.6). The distribution of  L c peaks at 0.075 au, with a secondary peak at 0.15 au, so overall, we find parallel correlation lengths consistent with but somewhat shorter than our earlier study of a couple of SW streams suggested. The first and second peaks are likely due to the nonlinear and quasilinear regimes of MFL wandering, respectively.
Having considered the two approaches for estimating the correlation lengths or times separately, focusing in Section 2 on the correlation-function method and in Section 3 on the CFD diffusivity-test method (and the closely related power-at-zerofrequency method), we noted in Section 4 that the results of the two methods generally are the correlation lengths and times of two different quantities. While the diffusivity-test method estimates the correlation length of the MFLs, which relate to the magnetic field integrated over a path, the correlationfunction method is generally applied to the magnetic field itself, not its integral along a path. Rescaling the c,B x t and other B x correlation-time estimates made with the correlationfunction method by the ratio ( ) C 2 cos 1 2 2 pb y ( Figure 29) of the (long-T) theoretical correlation time of B x over that of the MFLs ( c MFL t ), we demonstrate that the correlation times obtained by the two methods are very much consistent (Figures 30-35; already without accounting for nonlinearities, i.e., for a constant C = 1), provided that the correlationfunction method is applied on data intervals long enough (Figures 3-7 , 14-17, 36, 37). When applied to scale-averaged B x fields, the correlation-function method produces, without a need for rescaling, correlation times consistent with those of the CFD diffusivity-test method, but which are highly imprecise due to the reduced statistics resulting from the scale averaging (Figures 32, 33).
The scales are consistent, but the methods not quite equivalent. If the goal is to estimate the correlation length or time of MFLs, the CFD diffusivity-test method is much preferred because it is the one that also gives easy access to estimates of β (Equation (45)), needed for the conversion from c,B x t to c MFL t . If more local values of the correlation length or time are needed, then the correlation-function method is needed because it may be applied, with relatively good results, to data intervals that are a few times shorter than the CFD diffusivitytest method. Doing so allows one to study the correlation-time dependence on the SW speed and solar cycle (Figures 38  and 39).
Remembering that the purpose of knowing the correlation length is often to apply its value to the modeling of transport by diffusion, it would seem appropriate enough to make sure that diffusion does indeed occur on the scale of a few of the estimated correlation lengths. One sure advantage, in that regard, of defining the correlation length through the diffusivity test of Section 3 is that by the very definition of  L c , we know that diffusion starts at a few  L c (≈5 for a 0.1 logarithmic slope, 2π for a stricter zero-slope diffusion). Whereas the use of  L c , as defined in Section 3, seems most appropriate in problems of transport, the scales of Section 2, in particular L α , would appear more useful when dealing with questions of fields' predictability or the predictive power of SW upstream measurements. Ultimately,  L c and L α may be answers to two distinct questions, tools to be used in two distinct problems. One may also argue that energetic charged particles propagating in the SW feel the cumulative effect of the SW magnetic fields along their path, that their path to first order follows the MFLs, and that therefore the MFL correlation length  L c is most relevant to the transport of energetic charged particles, whereas the usefulness of the correlation length c,B x l of the "instantaneous" field in helping understand/describe the transport of these particles is somewhat more removed.
Finally, the Parker spiral projected power spectra at low frequency, needed by the theories for the calculations of the mean CFDs and parallel correlation length  L c , can be inferred from our CDF data analysis (Section 3.7). They flatten at a lower frequency and have a higher power at "zero" than the spectra obtained through the direct spectral analysis of the fields (see Figure 28). The wavenumber k 0 and "zero frequency" power P ⊥ (k 0 ) cannot be obtained by spectral analysis of the data obtained at 1 au at a spacecraft that is practically stationary relative to the Sun. Any such spectral analysis on SW data intervals of sufficient duration to reach the relevant low frequencies is doomed by the solar rotation, which causes a spectral flattening by transverse decorrelation. This explains why the correlation length ∼10 −2 au obtained early on from the direct computation of the power at "zero" frequency (Jokipii & Coleman 1968;Jokipii & Parker 1969) is a serious underestimate of the MFL parallel correlation length  L c .