Asteroseismic modelling strategies in the PLATO era II. Automation of seismic inversions and quality assessment procedure

Context. In the framework of the PLATO mission, to be launched in late 2026, seismic inversion techniques will play a key role in determining the mission precision requirements in terms of stellar mass, radius, and age. It is therefore relevant to discuss the challenges of the automation of seismic inversions, which were originally developed for individual modelling. Aims. We tested the performance of our newly developed quality assessment procedure of seismic inversions, which was designed for pipeline implementation. Methods. We applied our assessment procedure to a testing set composed of 26 reference models. We divided our testing set into two categories: calibrator targets whose inversion behaviour is well known from the literature and targets for which we assessed the quality of the inversion manually. We then compared the results of our assessment procedure with our expectations as a human modeller for three types of inversions: the mean density inversion, the acoustic radius inversion, and the central entropy inversion. Results. We find that our quality assessment procedure performs as well as a human modeller. The mean density inversion and the acoustic radius inversion are suited to large-scale applications, but not the central entropy inversion, at least in its current form. Conclusions. Our assessment procedure shows promising results for a pipeline implementation. It is based on the by-products of the inversion and therefore requires few numerical resources to quickly assess the quality of an inversion result.


Introduction
Convective motions in the upper layers of solar-type stars generate a wide range of stellar oscillations.By studying these oscillations, asteroseismology enables us to probe the stellar interior and characterise the stellar parameters with a precision and accuracy that is difficult to match with other standard techniques for non-binary stars.Asteroseismology experienced a rapid development in the past two decades.The launch of space-based photometry missions such as CoRoT (Baglin et al. 2009), Kepler (Borucki et al. 2010), and TESS (Ricker et al. 2015) initiated the so-called photometry revolution.The unprecedented data quality from these missions allows us to use cutting edge techniques, the so-called seismic inversions (see e.g.Reese et al. 2012;Buldgen et al. 2015bBuldgen et al. ,a, 2018;;Bétrisey & Buldgen 2022), that were until then restricted to helioseismology (see e.g.Basu & Antia 2008;Kosovichev 2011;Buldgen et al. 2019c;Christensen-Dalsgaard 2021;Buldgen et al. 2022a, for reviews).Such seismic inversions were applied to various asteroseismic targets (see e.g.di Mauro 2004; Buldgen et al. 2016aBuldgen et al. ,b, 2017a;;Bellinger et al. 2017;Buldgen et al. 2019b;Bellinger et al. 2019;Buldgen et al. 2019a;Kosovichev & Kitiashvili 2020;Salmon et al. 2021;Bellinger et al. 2021;Bétrisey et al. 2022;Buldgen et al. 2022b;Bétrisey et al. 2023b,a).In the near future, asteroseismic modelling will play a key role in the PLATO mission (Rauer et al. 2014) for a determination of the stellar mass, radius, and age meeting the mission precision requirements (1-2% in radius, 15% in mass, and 10% in age for a Sun-like star).It is therefore relevant to confront the current modelling strategies and discuss the remaining challenges for PLATO such as the choice of the physical ingredients (see e.g.Buldgen et al. 2019a;Bétrisey et al. 2022), the so-called surface effects (see e.g.Basu et al. 1996;Kjeldsen et al. 2008;Ball & Gizon 2014;Sonoi et al. 2015;Ball & Gizon 2017;Nsamba et al. 2018;Jørgensen et al. 2020Jørgensen et al. , 2021;;Cunha et al. 2021;Bétrisey et al. 2023a) and stellar activity (see e.g.Broomhall et al. 2011;Santos et al. 2018Santos et al. , 2019a,b;,b;Howe et al. 2020;Thomas et al. 2021;Santos et al. 2021).
In the first article of this series of papers, we presented a modelling strategy efficiently damping the surface effects and providing precise and accurate stellar parameters, based on the combination of a mean density inversion and a fit of frequency separation ratios (Bétrisey et al. 2023a, hereafter JB23).Stellar seismic inversions were originally developed for solar modelling, and methods to assess the quality of the inversions were naturally investigated (see e.g.Pijpers & Thompson 1992, 1994;Rabello-Soares et al. 1999;Reese et al. 2012).However, such methods cannot be applied in their current form to asteroseismic targets, because some of the simplifying hypotheses are only verified for the solar case, and the quality of an inversion is assessed manually by checking diagnostic plots and based on the experience of the modeller.The results of JB23 comforted us in the idea that a mean density inversion would be compati-ble with a pipeline approach.However, we also encountered a few lower quality inversion results that should be used with caution.In this study, we therefore propose a quality assessment procedure of seismic inversions that can be implemented in a pipeline.We considered three different types of inversion, the mean density inversion (Reese et al. 2012), the acoustic radius inversion (Buldgen et al. 2015b), and the central entropy inversion (Buldgen et al. 2018).We tested our assessment procedure on six calibrator models that are intensively studied targets in the literature: the Sun (see e.g.Reese et al. 2012),  (see e.g.Bétrisey et al. 2022), 16 Cyg A and B (see e.g.Buldgen et al. 2016aBuldgen et al. , 2022b)), and α Cen A and B (see e.g.Reese et al. 2012;Salmon et al. 2021).This procedure was then applied to 20 additional reference models for which we checked manually the diagnostic plots.
In Sec. 2 we describe the different inversions that we investigated and our testing set.In Sec. 3 we present our quality assessment procedure that is then applied on our testing set in Sec. 4. We discuss in Sec. 5 best practices to consider for a large-scale application, and in Sec.6 we draw the conclusions of our study.

Seismic inversions
In the paper, we use the following terminology.A seismic inversion takes as input a 'reference' model, which is typically the optimal model from a local or global modelling strategy.In our study, most of the reference models come from a Markov chain Monte Carlo (MCMC) fitting the individual frequencies and the classical constraints (e.g.effective temperature, metallicity, luminosity), and for the α Cen binary system, a Levenberg Marquardt approach (see e.g.Frandsen et al. 2002;Teixeira et al. 2003;Miglio & Montalbán 2005) was employed.As a sidenote, we note that the interferometric radius can also serve as a classical constraint.However, except for specific cases (e.g.Pijpers et al. 2003;Huber et al. 2012;White et al. 2013), such a measurement is rarely available.The inversion tries to recover the properties of an actual observed star or of a synthetic stellar model, which we call 'target' or 'observed' model.In our case, we considered both real observations, from the Kepler LEGACY sample (Lund et al. 2017) or from binary systems (Kjeldsen et al. 2005;de Meulenaer et al. 2010;Salmon et al. 2021;Buldgen et al. 2016aBuldgen et al. , 2022b)), and synthetic observations from Sonoi et al. (2015), where the surface effects are emulated as realistically as possible with 3D hydrodynamic simulations of the upper stellar layers patched on a 1D structure.Based on the differences between the reference and observed frequencies, the inversion provides a small correction to a quantity of interest, in our case the mean density, the acoustic radius, and a central entropy indicator.We refer the quantity of interest including the small correction from the inversion as the 'inverted' quantity.
In this section, we will provide a brief overview of the seismic inversion concepts that are pertinent to our study.We refer the reader to Gough & Thompson (1991), Gough (1993), Pijpers (2006) and Buldgen et al. (2022a) for a more comprehensive discussion.The seismic inversions are based on the so-called structure inversion equation.By studying perturbations of the stellar oscillations at linear order, Lynden-Bell & Ostriker (1967) and precursor studies (Chandrasekhar 1964;Chandrasekhar & Lebovitz 1964;Clement 1964) demonstrated that the equation of motion ful-fils a variational principle.Using this finding for the individual frequencies, Dziembowski et al. (1990) showed that at first order, the frequency perturbation could be directly related to the structural perturbation through the structure inversion equation where a and b are two structural variables, n is the radial order, l is the harmonic degree, ν is the oscillation frequency, and R is the stellar radius.K n,l a,b and K n,l b,a are the structural kernels, and the relative differences are computed with (2) The indices 'ref' and 'obs' stand for reference and observed, respectively.We note that Dziembowski et al. (1990) originally derived Eq. ( 1) for the (ρ, c2 ) structural pair, ρ being the density and c being the sound speed, and that the structure inversion equation can be adapted for any combination of physical variable that appears in adiabatic oscillations equations (e.g.Gough & Thompson 1991;Gough 1993;Elliott 1996;Basu & Christensen-Dalsgaard 1997;Lin & Däppen 2005;Kosovichev 2011;Buldgen et al. 2017cBuldgen et al. , 2018)).Then, based on the relative differences between the observed and reference frequencies, the equations ( 1) can be combined to compute a small correction to the reference model.Due to the limited number of modes in asteroseismology1 , the goal is to define a quantity of interest, a so-called seismic indicator t, which concentrates all the information of the frequency spectrum.It typically takes the form where f is a weight function function depending on the radius.The function g is a function of the first structural variable a, that typically takes a simple form such as g(a) = a or g(a) = 1/a (see e.g.Buldgen et al. 2022a, for a review).We note that a more general definition of the indicator can be used in specific cases (see e.g.Buldgen et al. 2015b).
Compared to an approach solving directly the oscillations equation, like a MCMC would do, solving the structure inversion equation has the great advantage of not relying on the physics of the stellar evolution model.Indeed, the reference model is only a starting point for the inversion.In addition, the inversion also does not rely on the starting point.Using a different starting point in the parameter space, the inversion would still correct towards the exact value assuming that the starting point is in the linear regime, which means that Eq. ( 1) is valid.The inversion can therefore provide a quasi-model independent correction.Several methods were developed to solve Eq. (1).Most of them rely on the optimally localised averages approach from Backus & Gilbert (1968, 1970) or on the regularised least-squares technique from Tikhonov (1963) (see e.g.Gough 1985;Christensen-Dalsgaard et al. 1990;Sekii 1997;Buldgen et al. 2022a).In our study, we used the subtractive optimally localised averages (SOLA) method (Pijpers & Thompson 1992, 1994) which minimises the following cost function where x = r/R and k is a normalization constant which depends on the indicator's properties (see e.g.Buldgen et al. 2022a, for a review).The averaging K avg kernel and the cross-term kernel K cross are related to the structural kernels by The goal of the SOLA approach is to provide a good fit of the target function T t while minimising the contribution of the cross-term and of the observational uncertainties.The variables β and θ are trade-off parameters to adjust the balance between the different terms during the minimisation, and λ is a Lagrange multiplier.The inversion coefficients are denoted by c i , where i ≡ (n, l) is the identification pair of an oscillation frequency, and k is a normalisation constant.We defined σ 2 = N i σ 2 i , where σ i is the 1σ uncertainty of the relative frequency difference and N is the number of observed frequencies.The last term in the cost function, denoted by F Surf (ν), is an empirical description of the surface effects.It introduces additional free parameters in the minimisation, in our case one, two or six depending on the surface effect prescription.These additional parameters come at the expense of the fit of the target function.
In this study, we considered three different indicators: ρ, τ, and S core .The indicator ρ is the mean density, and the target function of a mean density inversion is given by (Reese et al. 2012) where ρ R = M/R 3 and M is the stellar mass.The trade-off parameters are fixed to β = 10 −6 and θ = 10 −2 , and we use the (ρ, Γ 1 ) structural pair, where Γ 1 is the first adiabatic exponent.
The indicator τ is the acoustic radius and the target function of the inversion is given by (Buldgen et al. 2015b) As for the mean density inversion, we use β = 10 −6 and θ = 10 −2 , and the (ρ, Γ 1 ) structural pair.
The central entropy indicator S core is defined as (Buldgen et al. 2018) where S 5/3 = P/ρ 5/3 is an entropy proxy and P is the pressure.The weight function f (r) is defined as follows This complicated weight function is designed to probe the core regions of the entropy proxy profile, while minimizing the upper layers where S 5/3 follows a plateau in the outer convective zone and takes on high values close to the outer boundary of the model.Therefore, this region must be efficiently damped in the cost function.The target function is then given by This inversion is based the (S 5/3 , Y) structural pair, where Y is the helium mass fraction, and we use β = θ = 10 −4 .

Testing set
Our testing set is composed of 26 reference models that we divided in two categories.The first category is composed of six calibrator targets.For these calibrators, advanced and extensive modelling were conducted in the literature.The behaviour of the seismic inversions that were carried out on these targets was therefore thoroughly investigated.We considered the following calibrator targets: the Sun (see e.g.Reese et al. 2012), Kepler-93 (see e.g.Bétrisey et al. 2022), 16 Cyg A and B (see e.g.Buldgen et al. 2016aBuldgen et al. , 2022b)), and α Cen A and B (see e.g.Reese et al. 2012;Salmon et al. 2021).The second category is composed of 18 targets that we selected either from the Kepler LEGACY sample (Lund et al. 2017) or from Sonoi et al. (2015).These targets were less studied by the literature than the targets from the first category and they cannot be considered as calibrators.To assess the inversion quality of these targets independently from the quality assessment procedure of Sec. 3, we checked manually how well the target function is reproduced by the averaging kernel.We note that this check allows us to discard robustly the most problematic inversion results, but that there is a grey zone that depends on the experience of the modeller and where it is unclear whether the inversion result is robust.This uncertainty can be lifted by conducting a more extensive analysis, namely by generating a set of models representative of the target and investigating the behaviour of the inversion on the set, as it was done for the calibrator targets (see e.g.Bétrisey et al. 2022;Buldgen et al. 2022b).The second category is however relevant in the sense that we can check whether the automatic assessment procedure of Sec. 3 performs equivalently to a human modeller.
We summarised in Table 1 the observational constraints of our testing set.The reference model of Kepler-93 is the Model 1 from Bétrisey et al. (2022).For α Cen A and B, we considered two sets of reference models, including overshooting in α Cen A or not (see Table 4 in Salmon et al. 2021).We note that the reference models of α Cen B are different, because Salmon et al. (2021) evolved both stars of the binary system simultaneously in the minimisation.For the rest of the targets, the reference models were obtained with a MCMC fitting the individual frequencies and the classical constraints.The detailed modelling procedure is described in JB23, as well as the grid of models used  (2012).Models A to G are synthetic models whose classical constraints are known exactly.The uncertainty was chosen to match the expectation from a Kepler observation.The other targets are actual observations for which the luminosity was estimated using the spectroscopic parameters and Eq. ( 13) in JB23.The RUWE indicator of Gaia flags the parallax measurement of Arthur as unreliable, and the K s -magnitude measurement of Pinocha is unreliable as well.
for the MCMC.The solar model, model G, Baloo, Punto, and Tinky were added for this study and we proceeded exactly as for the targets from JB23.For the Sun, we selected the frequencies of the measurement n°01 from Salabert et al. (2015) and the effective temperature from Prša et al. (2016).The observational uncertainties of the classical constraints were adapted to match the precision of a target observed by Kepler: 85 K for the effective temperature, 0.1 dex for the metallicity, and 0.03L ⊙ for the luminosity.The data of model G was taken from Sonoi et al. (2015), and the data of Baloo, Punto, and Tinky from Lund et al. (2017).

Quality assessment procedure
Before we introduce the assessment procedure, we would like to clarify some terminological aspects.Synthetic models with known structures have been extensively employed to validate and establish the reliability of inversions (in particular in Reese et al. 2012;Buldgen et al. 2015bBuldgen et al. , 2018, , for the inversions of this study).In a concrete application on observed data, it is not possible to verify the accuracy of an inversion.However, it is essential to verify the numerical stability of inversions, which can be compromised by factors such as data quality or unaccounted nonlinearities (as observed in the case of α Cen A; Salmon et al. 2021).Numerical instability can indeed jeopardize the reliability of the inversion results.Previously, manual scrutiny of diag-nostic plots was the norm for assessing stability, but this article introduces an automated procedure for this purpose.Therefore, when we label an inversion as stable or successful, it indicates that the inversion was numerically stable.Conversely, if an inversion is labelled as a failure, it means that it was numerically unstable.In this case, the inversion result should be treated with caution.
Our quality assessment procedure is based on two tests, which were specially designed to be compatible with a pipeline and replace the manual verifications that were until then required to assess the quality of an inversion.These tests are based on so-called quantifiers, whose value corresponds to three different flags: reject the inversion result, check manually the inversion result by generating a set of models representative of the target and study the behaviour of the inversion on the set (see e.g.Bétrisey et al. 2022;Buldgen et al. 2022b), and accept the inversion result.The first test measures the quality of the fit of the target function by the averaging kernel.We call 'K-flag' the outcome of this first test.The second test quantifies the randomness of the inversion coefficients.Indeed, we noted in JB23 that smooth structures appear in successful inversions.If the inversion gets unstable, these structures break down, and the inversion coefficients tend to be randomly distributed.We call 'R-flag' the outcome of this second test.We recommend to use our assessment procedure as follows; the K-flag is first computed, and the R-flag is then evaluated only if the inversion was not rejected by the K-flag.Indeed, the goal of the first test is only to remove the inversion results that are clearly wrong prior to the second test, which is the core of our assessment procedure.

K-flag
The K-flag assesses the quality of the fit of the target function by the averaging kernel, and is a binary flag which takes following values: accept or reject.The quality of the fit of the target function is an important aspect of a seismic inversion because a poor fit may induce a non-physical inversion result.The question of the quality of the fit of the target function was raised at the same time as the seismic inversions were developed, and it was proposed to compute the square of the L 2 -norm of the difference between the averaging kernel and the target function (see e.g.Pijpers & Thompson 1992, 1994;Rabello-Soares et al. 1999;Reese et al. 2012) This quantifier was however introduced for solar inversions and implicitly assumes that the target function of the different reference models has a comparable amplitude, which is valid in solar modelling.Additionally, we point out that we are working with a scaled radius in the formulation of the kernels so that the domain of the kernels in all cases is [0, 1] and also, that the averaging kernels are always normalised to have an integral of 1 over this domain.It is therefore possible to compare the inversions by looking at the absolute value of χ t .In the top panel of Fig. 1, we illustrate the averaging kernels of the solar model by considering several prescriptions for the surface effects.In this conditions, the target function do not change.In the context of a space-based photometry missions such as Kepler or PLATO, the solar-type stars which are observed cover a mass range between 0.8M ⊙ and 1.6M ⊙ .The amplitude of the target functions varies significantly between the different targets, as shown in the bottom panel of Fig. 1, and it is less meaningful to compare them directly with χ t .However, χ t can still be used to filter the most problematic inversion results.Indeed, if the averaging kernel is unable to reproduce the target function (see examples in Fig. A.1), χ t takes a large value.By defining a rejection threshold large enough not to be sensitive to the specific amplitude of the target function, outlying inversion results with an extreme value of χ t can still be sorted out.We note that this threshold should not be interpreted as an exact threshold because of the limitations that we mentioned earlier, but rather as a filter in preparation for the second test.Based on our testing set of main-sequence solartype stars, we defined in Table 2 a rejection threshold for each of the inversions considered in this study.The form of the target function is specific to each type of inversion.The rejection threshold therefore depends on the type of inversion, but it is always possible to identify such threshold.
For the reasons given above, we have opted for a pragmatic way of determining the rejection threshold based on our testing set.
From a theoretical standpoint however, it would be possible to obtain a more objective estimate of this threshold by considering the following idea.Let us denote the χ t obtained using Eq. ( 14) as χ avg t .We construct a substantial number of pairs of models that we are able to distinguish asteroseismically (for example by looking on the edges of uncertainty boxes in HR-like diagrams).The models in these pairs have target functions T j t and T k t , re-

Rejection criterion ρ inversion
χ ρ > 4 τ inversion χ τ > 2 S core inversion χ S core > 1 spectively.Then, we calculate χ t using the difference between those two target functions and take the supremum Assuming that we have a reference model with χ avg t > χ sup t , it would imply that the averaging kernel of this reference model fits the target function less efficiently than a model that can be rejected based on the asteroseismic constraints alone.To generate the substantial number of model pairs, we could use the MCMC steps which are on the edges of uncertainty boxes.In practice however, the current version of the MCMC interpolates within the parameter space, but does not provide an interpolated structure.Accurately interpolating this structure would be quite challenging and could lead to a notable slowdown in the minimisation process, which is already quite expensive.Another option would be to use the grid models that are on the border of a 1σ or 2σ uncertainty ball around the MCMC solution.Further investigations are needed to determine the level of grid density required for generating of a sufficient amount of model pairs.Additionally, we anticipate challenges with the grid model structures from actual missions such as PLATO.Indeed, the grids utilized for these missions covers the entire parameter space of interest, taking a very large amount of storage space.Therefore, only reduced or minimal structures are saved and additional computations are needed to restore complete structures.In any case, the determination of χ sup t is probably too expensive to be employed on each an every target in a pipeline, but it may be useful to apply this procedure to benchmarks in the future to improve the estimate of the rejection thresholds adopted in this study.

R-flag
In a seismic inversion, we assume that the relative frequency differences are independent measurements, but under simplifying hypotheses, one can show that the acoustic frequencies follow an asymptotic relation (Shibahashi 1979;Tassoul 1980): where ǫ is a phase and ∆ν is the large separation.In our previous study (Appendix A of JB23), we noted that the inversion coefficients of a stable inversion tend to show smooth structures.Because of the asymptotic behaviour of the frequencies, the same seismic information can be shared by multiple frequencies and it is therefore not unexpected to find smooth structures in the inversion coefficients, as illustrated in Fig. 2 for the solar model.
If the target function is less well reproduced by the averaging kernel, these smooth structures break down and the inversion coefficients appear to be more randomly distributed, as illustrated in Fig. 2 for α Cen A. In JB23, we proposed to quantify this observation by looking at the lag plot (see e.g.Heckert et al. 2002, for a reference handbook) of the inversion coefficients.Indeed, as shown in the right column of Fig. 2, the inversion coefficients of a stable inversion tend to be positively correlated in the lag plot, assuming a lag of one.In our previous study, we suggested to quantify this correlation with the Pearson correlation coefficient (Pearson 1895).However, we found in this study that this measure is too sensitive to extreme values and is therefore not robust enough for a pipeline implementation.Indeed, one outlier can result in a Pearson coefficient close to zero, even though all the other points are linearly correlated.We therefore propose the following modifications.We compute the standard deviation of the inversion coefficients and discard the coefficients that are not in the 3-sigma interval around zero.We chose to center our interval around zero because it worked well with our testing set by discarding the coefficients that we would have discarded manually.Alternatively, the interval can be centred around the mean of the coefficients, although with a smaller tolerance.We note that if the number of modes gets low (below 15) or if the inversion is based on the modes of one harmonic degree only, it is preferable to use the second option.Indeed, in such extreme conditions, the first criterion is unreliable and may discard a large fraction of the modes.We also note that up to two coefficients are typically discarded.In general, they correspond to the lowest radial order modes of the harmonic degrees.The correlation of the lag plot is then evaluated with the Spearman correlation coefficient (Spearman 1904).This coefficient focuses on the rank variables R(X) and R(Y) instead of the random variables X and Y themselves.In that regard, the Spearman correlation coefficient is the This approach is more general, the Spearman coefficient indeed detects a monotonic correlation between the random variables, and has the great advantage of being significantly less sensitive to outliers.We note that if two random variables are linearly correlated, the Pearson and Spearman coefficients are equivalent.Because of these advantages, the Spearman coefficient is more robust and better suited for a pipeline implementation.As in JB23, we identified three regimes, which are summarised in Table 3.We consider that below R t = 0.4, the inversion coefficients show too much randomness for meaningful inversion.In that case, we reject the inversion result.If R t > 0.65, we consider that the inversion coefficients form smooth structures and we accept the inversion result.The in-between regime is more uncertain and we recommend to carry out further investigations.Because this test is based on the inversion coefficients, the boundaries of the different regimes are not dependent on the type of inversion that is considered.

Mean density and acoustic radius inversions
The mean density inversion and the acoustic radius inversion are based on the same structural kernels and share the same tradeoff parameters.The form of their seismic indicator is simple compared to more ambitious inversions such as the central entropy inversion.Due to these similarities, the results of our quality assessment procedure are very similar.In that regard, if the mean density inversion is flagged as accepted, the corresponding acoustic radius inversion is typically flagged as accepted too, and vice versa.In this section, we therefore focus our discussion on the results of the mean density inversions, which are displayed in the top line of Fig. 3 and in Table 4, and the results of the acoustic radius inversions can be found in Appendix B.
The results of the calibrator targets are consistent with our expectations.For the solar model, introducing additional free parameters to describe the surface effects increases the value of the χ ρ quantifier.The opposite behaviour is observed with the second quantifier, the Spearman correlation coefficient R ρ.In addition, all these inversions correct towards the expected mean density range.The results of 16 Cyg A and B behave similarly to the results of the Sun.The inversions of both binary components correct towards the measurements of Buldgen et al. (2022b), and all these inversions are accurately flagged as accepted.In addition, with the high data quality of these targets, both quality quantifiers correctly reflect that the Ball & Gizon (2014) prescription is slightly more stable than the Sonoi et al. (2015) prescription, and that the inversions that neglect surface effects are the most stable.
As a sidenote, we note that neglecting surface effects gives the most stable inversions because it imposes fewer free variables in the minimisations, but it does not mean that the outcome of this inversion is the best physical results.Indeed, as pointed out AlphaCentA -Ball & Gizon ( 2014) by JB23 and by many other studies, the Ball & Gizon (2014) prescription is the best default choice.For the α Cen binary system, we expected poor quality inversion results.Indeed, the data quality of these targets is lower than the data quality of the other calibrator targets.For these targets, we investigated two types of reference models, with or without overshooting in α Cen A.
From the literature, we know that the relative frequency differences of α Cen B are too large for a robust inversion based on individual frequencies, and that the inversion results of the model α Cen A including overshooting are significantly affected by the choice of the mode set, suggesting that some of the modes have a non-linear character.As expected, the inversions using the sixth order polynomial to describe the surface effects are rejected.The rest of the inversions are either flagged as rejected or as requiring a manual and thorough investigation.The results for these targets show the relevance of using both quality flags.Indeed, due to the amplitude differences of the target function of models over a large mass range, the fine-tuning of the rejection threshold of the K-flag is limited, and for these models in particular, it would benefit from a lower tolerance.Although the K-flag has a non-negligible false positive rate, most of these problematic inversions are detected by the second quality flag.In addition, we note that for the model of α Cen A without overshooting and with the Sonoi et al. (2015) surface effects prescription, the inversion result is rejected by the K-flag but not by the R-flag, which illustrates a limitation of the R-flag.If the target function is not reproduced at all by the averaging kernel, the correction proposed by the inversion is non-physical and is the result of the poor fit of the target function.However, it may still create structures in the inversion coefficients that are detected by the R-flag.This is not an issue for our assessment procedure, where the K-flag is first computed.Indeed, if the inversion is rejected stable inversion results (e.g.Pinocha).However, the inversion result of Dushera is flagged as unreliable.As in the case of α Cen A, it is possible that one (or more) of the modes is affected by non-linearities, which could explain this unstable behaviour.Alternatively, it is also possible that there was an issue with the peak bagging.Indeed, Roxburgh (2017) observed anomalies in some of the LEGACY data.Although this study did not analyse Dushera's data, it is possible that it was impacted by the same issues, which could also explain the unstable behaviour of the inversion.Both possibilities could be investigated by a comprehensive analysis using local minimisations, which is beyond the scope of this study.Nevertheless, this result is promising as it indicates that our assessment procedure is able to highlight problematic inversion results that are usually difficult to detect manually.Additionally, the stable inversion results are correctly flagged as stable.Hence, the combination of both flags performs satisfactorily for the mean density inversion and the acoustic radius inversion with respect to a human modeller.
The Ball & Gizon (2014) prescription is the preferred surface effect prescription for the PLATO pipeline and it is therefore relevant to look at the flag distribution of the inversion results with this prescription.We note that our testing set is not bias-free, we only considered medium to high data-quality targets (more than 30 observed modes) and we included several poor quality inversion results to verify that they could be spotted by our assessment procedure.The percentages that we quote below should therefore be interpreted with caution, and further investigations with a larger statistics and including lower data-quality targets are required.With our testing set, about 20% of the results are flagged as rejected, about 20% as to be checked manually, and the remaining 60% as accepted.Although it is difficult to draw robust conclusions based on these numbers, we note that few inversion results are rejected and also that few results require further investigations.This is an important aspect because such investigations cannot be carried out within the pipeline.Hence, these results comfort us in the idea that the mean density inversion is suited for a large-scale application.

Central entropy inversion
The results of the central entropy inversion are shown in the bottom line of Fig. 3 and in Table 5.For all the models, we found that the inversion fails if surface effects are included.Indeed, the inversions using the Ball & Gizon (2014) and Sonoi et al. (2015) prescriptions have averaging kernels that completely miss the central stellar features of the target function.Hence, all these inversion results can be discarded because these central layers are the region of interest of the inversion.In addition, the situation is even worse with the sixth order polynomial.In this configuration, the number of degrees of freedom is insufficient to carry out the SOLA inversion.As expected, the K-flag rejects all these inversions.Although we recommend to avoid computing the R-flag of inversions that were rejected by the K-flag, we provided in Table 5 the R-flag of such models to illustrate why we emitted this recommendation.As shown in Table 5, the R-flag is not reliable in such conditions.Regarding the results of the inversions that neglect surface effects, our quality assessment procedure performs equivalently to a human modeller.The models with a lower inversion quality are indeed correctly spotted by the R-flag.
These results however question the relevance of including this inversion in a pipeline, at least in its current form.This indicator is indeed designed to probe the central stellar layers, but it is at the same time very sensitive to the surface regions because it is based on the S 5/3 profile which is too sensitive to these regions.Hence, to robustly interpret the results of this type of inversion, it is necessary to generate a set of models representative of the observed target and study the behaviour of the inversion on the set, as is done in Buldgen et al. (2017b), Salmon et al. (2021) and Buldgen et al. (2022b) for example.We note that using a similar indicator but based on frequency separation ratios (Bétrisey & Buldgen 2022) would also be incompatible with a pipeline approach.Even though such an indicator is significantly less affected by surface effects, it is based on ratios which might take very small values, and therefore result in singular relative ratios differences.This inversion therefore requires some caution in the data processing and in the interpretation of the results.For this inversion too, it is necessary to generate a set of models and study the inversion on the set.

Preconditioning of the inversion
The variational inversions are based on a linear formalism.During the derivation of the structure inversion equation at the basis of the variational inversions, this linearity assumption allows us to neglect of lot of higher order terms, notably surface terms arising from partial integration, and end up with a simple equation directly relating frequency differences to structure differences.Non-linearities may therefore significantly affect the inversion by inducing unwanted compensations, and they are usually difficult to spot.In this section, we discuss two types of common non-linearities, the mode non-linearity and the non-linear regime of the reference model.
In the first scenario, the mode itself exhibits a non-linear behaviour.A mixed mode mistaken for a pressure mode fits within this category.It is often tricky to detect and in our testing set, we suspect that the model of α Cen A including overshooting is affected by such a non-linearity.For the rest of the targets, there is a priori no sign of mode non-linearity.We note that it would require a thorough investigation of each target to robustly disprove the presence of such non-linearities, but based on the current inversion results, it is reasonable to assume that it only affects a minority of targets.It is therefore unlikely to be an issue in a pipeline.
The second scenario is related to the reference model.If it is too far away from the observed target in the parameter space, structural differences may be too large for the linear assumption and might induce compensations in the inversion.In Fig. 4, we show the structural density differences for a model within the linear regime ('target2NuOv000'), fitting the individual frequencies, and for a model outside the linear regime ('target2R01Ov000'), fitting the r 01 ratios alone.We took these models from Bétrisey & Buldgen (2022).In the illustration, 'tar-get2R01Ov000' has large differences in the upper layers which are magnified by the structural kernels and their large amplitude in these regions.It induces unwanted compensations and the inversion is unsuccessful.The boundaries of the linear regime are often unclear and may change from target to target.However, good preconditioning can ensure that the reference model is in Fig. 4: Density differences of two reference models of target 2 from Bétrisey & Buldgen (2022).The reference model 'target2NuOv000' is within the linear regime, while the model 'target2R01Ov000' is outside of the linear regime.
the linear regime.Hence, a fit of the individual frequencies and the classical constraints with a MCMC typically ensures this assumption.This type of non-linearity is therefore not an issue for the modelling strategy that was proposed in JB23, which starts with such a fit and then corrects for the surface effects by combining a mean density inversion and a fit of frequency separation ratios.

Limited mode sets
In Sec. 3, we tested our quality assessment procedure on targets with a data quality going from medium to high.It corresponds to mode sets that are composed of more than 30 individual modes.This number of modes allowed us to use a statistical tool at the basis of the R-flag.PLATO will however detect many targets with fewer pulsation frequencies.Hence, we tested how our assessment procedure behaves in such cases.We investigated three calibrator targets, the Sun, Kepler-93, and α Cen A, that are representative of good, medium, and poor quality inversions, respectively.
We summarised the results of our assessment procedure in Table 6.Due to the lower number of modes, we did not discard inversion coefficient above the 3σ threshold used to remove the outliers before the computation of the Spearman correlation coefficient.For the Sun, the inversion is robust if only the l = 0 modes are used (18 modes in total), except for the sixth order surface effect prescription, as expected.We also tested carrying out the inversion based on ten l = 0 modes around ν max only and on four modes around ν max of each harmonic degree (12 modes in total).In such conditions, the fit of the target function by the averaging kernel is insufficient and the inversion is rejected by our assessment procedure.For Kepler-93, the quality of the fit of the target function by the averaging kernel with the Ball & Gizon (2014) and Sonoi et al. (2015) prescriptions is in a grey zone.Just based on this fit, we would have rejected the inversion results.However, based on the inverted mean densities that are consistent with Bétrisey et al. (2022), and based on the inversion coefficients that form smooth structures, it seems that these inversions were successful.As expected, these inversions are flagged as accepted by our assessment procedure.Unsurprisingly, α Cen A, which had poor quality inversion results with all the modes, has even worse quality inversion results if only the l = 0 modes are used.
to high quality targets, with at least 30 observed modes.Dealing with lower statistics may be an issue for the second test of our procedure, but not for the first test.In that regard, we believe that our procedure is still applicable on limited mode sets, although this aspect would benefit from further investigations.However, a limited mode set of a dozen of frequencies could be an issue for the inversion itself.Indeed, the kernels of such mode sets may be insufficient for the averaging kernel to reproduce the target function.In these conditions, the success of an inversion becomes unpredictable and is sensitive to the mode set which is used.
Putting these results in the context of the PLATO mission, our quality assessment procedure of seismic inversions showed promising results.It is indeed based on by-products of the inversion and the two quality tests which are performed require few numerical resources.Hence, our assessment procedure can assess quickly and inexpensively the quality of an inversion, while still performing as well as a human modeller.

Fig. 1 :
Fig. 1: Averaging kernels of the solar model and variation of the target function for a selection of models from our testing set.Top panel: Averaging kernels of the solar model by considering different surface effect prescriptions.Bottom panel: Variation of the target function of a mean density inversion for a selection of models from our testing set.

Fig. 2 :
Fig. 2: Diagnostic plots of the solar model and of the α Cen A model.From top to bottom: Diagnostic plots of the solar model by neglecting the surface effects, by using the Ball & Gizon (2014) surface effect prescription, and by using a sixth order polynomial for the surface effects, and of α Cen A by using the Ball & Gizon (2014) prescription.Left column: Fit of the target function by the averaging kernel.Central column: Inversion coefficients.Right column: Lag plot of the inversion coefficients.The points in red are the values that were excluded.

Table 1 :
Observational constraints of the targets from our testing set.

Table 2 :
Rejection threshold of the K-flag.

Table 3 :
Instability regimes of the R-flag.