Empirical relations for the accurate estimation of stellar masses and radii

In this work, we have taken advantage of the most recent accurate stellar characterizations carried out using asteroseismology, eclipsing binaries and interferometry to evaluate a comprehensive set of empirical relations for the estimation of stellar masses and radii. We have gathered a total of 934 stars -- of which around two-thirds are on the Main Sequence -- that are characterized with different levels of precision, most of them having estimates of M, R, Teff, L, g, density, and [Fe/H]. We have deliberately used a heterogeneous sample (in terms of characterizing techniques and spectroscopic types) to reduce the influence of possible biases coming from the observation, reduction, and analysis methods used to obtain the stellar parameters. We have studied a total of 576 linear combinations of Teff, L, g, density, and [Fe/H] (and their logarithms) to be used as independent variables to estimate M or R. We have used an error-in-variables linear regression algorithm to extract the relations and to ensure the fair treatment of the uncertainties. We present a total of 38 new or revised relations that have an adj-R2 regression statistic higher than 0.85, and a relative accuracy and precision better than 10% for almost all the cases. The relations cover almost all the possible combinations of observables, ensuring that, whatever list of observables is available, there is at least one relation for estimating the stellar mass and radius.


INTRODUCTION
The existence of empirical relations among some observable stellar characteristics is well known from the initial works of Hertzsprung (1923), Russell et al. (1923), andEddington (1926). Improvements in the observational data, data analysis techniques and/or physical models have led to updates and revisions of these empirical relations (see Demircan & Kahraman 1991, for example).
In recent years, a number of revisions of these empirical relations have been developed (Torres et al. 2010;Eker et al. 2014;Gafeira et al. 2012;Benedict et al. 2016). One of the common points of all these works is that they have used eclipsing binaries as observational targets.
Although some derived relations have been extensively used in the literature (Torres et al. 2010, for example), the Mass-Luminosity relation and the Mass-Radius relation, two of the most conspicuous, have two main weak points: (i) The luminosity is, in general, known with great uncertainty, and this uncertainty is translated to the mass determination; and (ii) the radius is usually unknown.
Recent improvements in the observational data quality and quantity have opened new opportunities for re-evaluating these relations: • The first Gaia data release (Gaia Collaboration et al. 2016) has offered a new framework, providing accurate stellar luminosities on a significantly increased sample of stars. This has allowed a revision of the characteristics of some eclipsing binaries (Stassun & Torres 2016).
• Hundreds of isolated stars have been characterized using asteroseismology yielding unprecedented precision, mainly thanks to Kepler (Gilliland et al. 2010) and CoRoT (Baglin et al. 2006) data.
All these points together have opened a door to a complete revision of empirical relations for the accurate determination of stellar masses and radii.
In this paper we study all the possible empirical relations using the effective temperature (T eff ), luminosity (L), surface gravity (g), mean density (ρ), and/or stellar metallicity ([Fe/H]) as independent variables and the stellar mass (M) or radius (R) as dependent variable. For this revision, we have gathered together data on all the stars in the literature that have been accurately characterized using asteroseismology, eclipses in detached binary systems, or interferometry.
As a result, 38 new or revised relations (18 for M and 20 for R) are obtained with an adj-R 2 statistic larger than 0.85 (in fact, an 89% of them have a adj-R 2 > 0.9), an accuracy better than 10% (except in three cases), and a precision better than 7.5% (except in one case), depending on the observables available.
It is important to bear in mind that these relations are no substitute for the techniques that have been used to provide our source data. Our main aim is to condense the information provided by them into simple linear relations to estimate the stellar mass and radius, for use when source data are not available.
towards each other because of mutual gravity; therefore, it is preferable to study only detached binaries where such effects are negligible (see Eker et al. 2014).
• Interferometry: Currently it is not possible to resolve the angular diameter of stars with conventional telescopes. This requires angular resolutions that are of the order of milliarcsec (Boyajian et al. 2013;Maestro et al. 2013). However, optical interferometers offer spatial resolutions that are several orders of magnitude better than conventional telescopes. The concept of interferometry is based on combining signals from an array of telescopes to obtain a unique interference pattern equivalent to a signal received by a single telescope with an aperture diameter equal to the maximum baseline of the array. The interference pattern can be used to directly measure the angular diameter with remarkable accuracy, and when combined with the distance can be used to derive the radius.
A thorough survey of recently published studies, based on the techniques previously described, produced an initial collection of close to 2000 entries. For each entry, the sample contains the following astronomical parameters: M, R, T eff , L, [Fe/H], g, and ρ, and their respective uncertainties. M, R, g, and ρ were derived directly or indirectly from one of these three techniques. Stellar properties that were calculated, by definition, from already determined parameters (e.g. g and ρ obtained from the derived M and R, and not from the observational data) were not taken into account if possible.
[Fe/H] was obtained mainly from spectroscopy. T eff , depending on the case, comes from spectroscopic  . Each panel accounts for the techniques used for studing the star: A = Asteroseismology, EB = Eclipsing Binaries, I = Interferometry. Serenelli et al. 2017;Silva Aguirre et al. 2017, etc., in an intended order.), following intra-technique heterogeneity and reliability criteria.
Not surprisingly, we found that not all catalogs provided data for all parameters. [Fe/H] is not available for all observations in the sample, and the same is true for g or ρ. In Table 1 we summarize the contributing stellar parameters by catalog, and identify the corresponding units of measurement.
The final calibration sample consists of 934 stars, of which 726 are on the Main-Sequence (MS) and 208 are post-Main-Sequence (post-MS) Subgiants or Giants. The most significant contributions come from Eker et al. (2014) with 222 stars, and Serenelli et al. (2017) with 397 stars. The MS/post-MS classification was done using the evolutionary tracks described in Rodrigues et al. (2017), with solar metallicity. The impact of this classification in our results, when tracks with other characteristics are used, is analyzed in section 5.3. The sample contains stars from a wide range of spectral types, but the vast majority, or more than 700, are of types F or G. In Fig. 1 we show the location of the MS/post-MS stars in the HR-diagram. We also show some theoretical model tracks as reference obtained using PARSEC ). Follows a brief overview of the articles/catalogs that were used as input to build the final calibration sample (see Table 2 for reference). Chaplin et al. (2014), using asteroseismic analysis based on Kepler photometry of the first 10 months of science operations, determined M and R of more than 500 stars. The study can be divided into 2 subsets. A subset of 87 stars with atmospheric properties (T eff and [Fe/H]) obtained by highresolution spectroscopy (see Bruntt et al. 2012  The authors used different model pipelines (ASTEC, BaSTI, Padova, Yonsei-Yale, among others) to compute a likelihood function to determine the best fitting-model with which they estimated M and R of 77 stars (all confirmed or candidate planet-hosting stars). Luminosities come from VOSA, and ρ comes from scaling relations. They did not provide g. [Fe/H], used as a constraint, was compiled from the literature. All stars in the sample display errors in L of less than 10%. The stars are also potential exoplanet host stars. Karovicova et al. (2018) found that the angular diameters they have derived of 3 metal-poor benchmark stars are smaller than those derived by other interferometric studies of the same stars (Creevey et al. , 2015. They claim that comparative data between photometric and interferometric T eff suggest that diameters of less than 1 milliarcsec appear to be systematically larger than expected. They argue the difference is due to calibration errors, and that the discrepancy tends to increase with the decrease in angular diameter. All but 3 stars of Ligi et al. (2016) have angular diameters of less than 1 milliarcsec. Karovicova et al. (2018) suggest that the Ligi et al. (2016) catalog could be overestimating R. In any case, this subsample is always a small percentage of the total sample.
The Malkov (2007) catalog is based on a set of detached main-sequence double-lined eclipsing binaries. The catalog is a collection of studies found in the literature, the vast majority from the 1990s and early 2000s (Malkov 1993, and others), and compiles M, R, T eff , and L of 215 stars. We chose a subset of stars that are mainly AFG; with a mean error in M, R, and T eff of about 3%, and 12% in the case of L. 3. DATA ANALYSIS To analyze the data, we followed a three-step procedure. We first defined the combinations of variables to be tested, then we selected the best subset of stars for analyzing this particular combination, and finally we applied a Generalized Least Squares with Measurement Error , GLSME, see Section 3.3) algorithm to obtain the regression coefficients, their errors, and some statistics to analyze the quality of the regression (adj-R-Squared statistic, from now on R 2 for simplicity; mean accuracy, Acc; and mean precision, Prec, of which more below. See, for example, Fuller (2008)).

Combinations of variables
One of the main aims of this work is to study all the empirical relations possible for estimating stellar masses and radii, selecting those providing a better description of the data. We have searched for any possible combination describing the information contained in the data, no matter which variables are combined with others.
In addition, we have also allowed combinations where variables are substituted by their logarithms.
That is, we have studied all possible combinations with the form: M or R or logM or logR = f (T eff or logT eff , L or logL, Combinations of one single variable, two, three, four, and five variables are allowed. This means a total of 576 possible combinations.
There are combinations of variables that add little or no new information over a single variable.
In Fig. 2 we show the Kendall-τ correlation coefficient of all the possible pairs of observables. We find that there is only one strong correlation (larger than 0.75), i.e., there is only one obvious case of redundant variables. Gravity is highly correlated with density, as expected. Luminosity is also anti-correlated with density (τ =-0.73), close to our threshold. In Appendix A we show the scatter plots of these cross-correlations.
Therefore, we proceed to study all the variables as if they were independent except gravity-density.
We have removed every relation where these two variables appear at the same time, since both provide redundant information. We have decided to keep those relations with luminosity and density at the same time since, although they are correlated, we estimate, looking at the scatter plot shown at the Appendix, that each one can provide some independent and complementary information. We conclude this section by noting again that we are not focused on investigating physical clues from data, rather on obtaining relations that capture the source information provided by the methods described in section 2. When source data or information are available from those methods, we suggest to use them to estimate masses and radii. If such data are not available, the relations we present can offer similar but less precise estimations.

Selection of the best subset
For a given relation we select a subset of stars for the regression analysis that fulfill certain characteristics . The remaining stars are then used as the control group for studying the accuracy and precision of the relation.
The idea behind this selection is to balance the accuracy obtained when the variables with a better precision are used, with the precision obtained when the number of stars in the subsample is raised.
We have found that a good balance between accuracy and precision in our results is reached when we select for the regression those stars with an uncertainty in M, R, T eff , logg, and/or ρ ≤ 7%, and an uncertainty in L ≤ 10%. For example, if we are going to test the relation M = f (T eff , L), we first select the subset for the regression, which includes those stars fulfilling the requirements that ∆M and ∆T eff ≤ 7%, and ∆L ≤ 10%, leaving the rest of the stars as the control group. If the relation is R = f (ρ), we select those stars fulfilling ∆R and ∆ρ ≤ 7%, with the remaining stars again left as the controls.
This selection implies that the number of stars in the regression and control groups changes from one relation to another. At this point, we recall again that one of the features of this study is that we mix different techniques, trying to balance any possible bias of one technique with the unbiased determinations of the others. For every relation we present the percentage of stars characterized by the different techniques and with different spectral types (tables 7 and 8). The percentages of the complete sample are displayed in Tables 3 and 4 (see section 5.1).

Analysis method
The use of an error-in-variables linear regression algorithm ensures a robust treatment of the measured uncertainties, and more reliable results compared with using only the central observed values, as is the case for the standard linear regressions.
Following , we use the error-in-variables model GLSME (Generalized Least Squares with Measurement Error): where y is a vector with the central values of the observed dependent variable, D is a matrix with the central values of the observed independent variables, β is a vector with the regression coefficients to be estimated, and N(0, V) represents the normal distribution centered at zero having variance V.
In the most general case, V is comprised of the measurement uncertainties and the possible random effects of the model itself. V e is a matrix with the measurement errors of the dependent variable, σ 2 T is a matrix with the residuals of the true dependent variable, that is, the impact of these possible random effects in the dependent variable. Finally, Var[U β|D] is a matrix counting for the independent variables uncertainties, and it contains V U , the independent variables measurement errors, and V D , the possible effects in the independent variables of a random term. In our case, we assume that, if there is a physical relation combining several variables, its application is deterministic. That is, there is not any additional random term. Therefore, σ 2 T and V D =0, and only the measurement errors must be included in the study. Assuming that the published uncertainties of the different measurements correspond to σ (unless they are explicitly informed), V e is an n × n diagonal matrix (with n the number of stars used for obtaining the regression) with the σ 2 measurement uncertainties of the dependent variable. On the other hand, V U is a collection of m n × n diagonal matrices (with m the number of independent variables) with the σ 2 measurement uncertainties of the independent variables. For a more detailed analysis of the different components of the GLSME model, we refer the reader to the Appendix in  For every combination of variables (e.g. M = f (T eff , L)), we construct all the possible alternatives including those with their logarithms (e.g.
. We then perform the error-in-variables linear regression, using de GLSME model, to obtain estimates of the regression coefficients β and their uncertainties ∆β. For each best-fitting relation we then extract the following summary statistics: • The well-known R 2 statistic: This measures the percentage of the dependent variable variance explained by the linear regression, for the regression sample used to obtain the regression coefficients.
• The Relative accuracy (Acc): For a given relation and control group (i.e., different from the regession sample used to obtain the linear relation), we have the expected values of the dependent variables (y fitted ) and their "real" values (ŷ). We may therefore define the global relative accuracy of the linear regression as: • Relative precision (Prec): As per the above, we may also define the global relative precision of the linear regression as: where σ i,fitted is the standard deviation when evaluating the relation for every element of the control group.
The standard deviation is obtained via error propagation. To estimate it for the relative precision of the dependent variable (M or R) of the control group, only the central values of the independent variables are used. Therefore, the standard deviation (σ i,fitted ) is a reflection only of the coefficient errors.
The selected combination for a given group of dependent and independent variables is that providing as high an R 2 and as low an Acc and Prec possible. Finally, only those relations with R 2 > 0.85 have been selected for further scrutiny.
In Table 5 we present all the statistical characteristics of the selected relations. In terms of R 2 , in Fig. 3 we show a histogram of the values obtained. We see that most of the relations explain more than the 95% of the variance of the dependent variable, while 89% of them have a R 2 > 0.9.   If we take a look at the control groups, we can see that in most of the cases the number of stars in these groups is in the range [81,228]. The statistical tests performed on these groups should as such, be reliable. There are two exceptions. The relations logR logg and logR T eff + logg have been tested with only 8 stars. Therefore, the Acc and Prec shown in these cases must be taken with caution.
In Fig. 4 we show the histogram of the relative accuracies. All are lower than 10% except for three cases: logM logL, R logL and logR logg. In general, the relative accuracy is lower (poorer) for relations using only one independent variable, as expected. Most of the relatives accuracies better than 5% are related to the estimation of the radius. In general, the relations estimating the radius are more accurate than those estimating the mass (a mean value for all the relations of 5.3% (R) versus 7.98% (M)).
In Fig. 5 we show the histogram of the relative precisions. Here we also find that most of the relations provide relative precisions better than 7.5%. In fact, 84% of the relations have a Prec < 3%. Note that these relative precisions take into account only the contribution of the errors in the regression coefficients. To obtain a realistic standard deviation for an estimation of a mass or radius we must add the uncertainty coming from the input variables. Therefore, the tight relative precisions shown in Fig. 5 are good news. The relations for R again provide better precisions than those for M    In table 10 we show the best-fitting coefficients of the selected relations and their errors in the format X(Y ) ≡ X × 10 Y . The first column of the table describe the relation selected (e.g. Z = f (X + Y ) ≡ a ± e a + (β X ± e β X )X + (β Y ± e β Y )Y ). The coefficients shown are those multiplying the independent variables in the relations independently whether it is included as a logarithm or not.
In Table 11 we show the ranges of validity of each relation. These ranges are set by the maximum and minimum values of each independent variable used in the relation (i.e., from the input data in the regression group used to obtain the relation). We see that, in general, the larger the number of independent variables involved, the narrower is the range of validity of the relation.
Finally, in the light of the high correlation found between gravity and density, we have obtained an error-in-variables regression model relating these two variables with the existing data sample. In this case, we have used the relation logg logρ. A summary of the parameters of this relation can be found in Table 6.

Ensuring the heterogeneity
As noted previously, one of the features of this work is that we have used a heterogeneous data, in terms of the techniques used, since this can in principle reduce the influence of possible biases inherent in the observations, and reduction and analysis methods. As described in section 4, to extract the different relations we use a subset of stars fulfilling certain criteria. Here, we test whether these selections affect the heterogeneity of each regression sample.    and interferometry (I) in the regression sample used to obtain each relation. We see that there are two groups of relations: those with a balance of techniques similar to that of the complete sample (see Table 3) and those where most of the stars (or the 100%) come from the asteroseismic subsample. The reason for this difference is the presence or absence of ρ as an independent variable.
Asteroseismology provides a strong constraint on density directly from observations. Therefore, those relations including the density may be impacted by any possible bias coming from this technique. The rest of the relations are well balanced. The number of stars coming from interferometry is small, and the presence of them in the subsample does not have a significant impact on the statistical balance.
We have also looked carefully at the impact of the stellar spectral type. In Table 8 we present the percentage of stars of different spectral type that feature in the regression samples for each relation.
We see that the main contribution comes from F-stars, followed by G-stars (with percentages similar to the global sample; see Table 4). The rest of the spectral types have smaller contributions depending on the relation studied, but the balance and the contribution of different spectral types is generally similar throughout. That said, we note two small biases: (i) cool stars (K stars and the only M star of the sample) have in general a small presence in the subsamples; and (ii) when the density is in the relation, there is a larger contribution of F and G-stars, since asteroseismology provides most of its data for these stellar types.

Linear regressions consistency
In addition to using R 2 , relative accuracy and relative precision as main statistics for studying the quality of the regressions, we have also developed additional consistency tests to ensure that the linear regressions are representative of the observational data.
In Fig. 7  contain only a small percentage of the observational set. The impact of these values on the regression coefficients is analyzed in the next figure.
In Fig. 9 we represent a final and more complex consistency test. Here we analyze the influence of every observational point in the regression coefficients. This influence is calculated using the Cook's distance (Sheather 2009, D i ). This distance is calculated as a combination of the residual and the leverage (or how isolated a value is) for every point. The plots of Fig. 9 show the standardized residuals as a function of the leverage, and the Cook's distance is represented by the size of the points. According to Weisberg (2005), "... if the largest value of D i is substantially less than one, deletion of a case will not change the estimate ... by much". Following this interpretation, only in four cases we have some points with D i > 1, and another two with some points with D i close to 1.
In all cases, these points have large leverages, that is, they have a large influence on the estimates because they are extreme points isolated from the rest. This means that in these cases there are zones in this parameter space poorly sampled by our set, pointing where we must focus on improving our sampling.

Influence of the definition of the Post-MS
In section 2, when we described the data sampling, we mentioned the number of stars labeled as Main-Sequence (MS). There we explained that we used the evolutionary tracks with solar metallicity described in Rodrigues et al. (2017) for this classification. The observational classification of a star as MS or post-MS is not a trivial task. Therefore, we have analyzed the impact on our results of using different tracks and physics to make this selection.
We have used tracks described in Rodrigues et al. (2017) using the same physics, but with different metallicities in a range Z = [0.00176, 0.0553]. In addition, we have also used tracks that include diffusion and also cover a wider range of metallicities, ranging in Z = [0.00002, 0.06215]. In every case, the free parameters were calibrated so that a 1M model describes the Sun at solar age.
For each track, we select the position in the T eff − logg diagram where the star leaves the MS. The spread given by the different adopted model grids enables us to construct a probability distribution for the classification.   Using a Monte Carlo method, we have constructed up to 100 possible classifications of our 934 stars, resulting in 100 different subsets of stars classified as MS, and tested the impact of these different possible classifications on our results. Here, we show the impact for one of the relations of Table 5: The results obtained are shown in Table 9. Here we see the values obtained for the coefficients, their errors, and the statistics used for characterizing the goodness of the fit. "Mean" is the mean of each element over the 100 realizations; "S.D." is the standard deviation of these 100 realizations; and "Real" is the value we have obtained with our reference classification.
It is evident that the impact on the results of changes to the classification is small.

Results obtained using other methods
We have compared our results with those coming from the use of the standard linear regression (SR), and from a Random Forest model. The most common algorithm for fitting a model to a group of data is the standard linear regression.
We have repeated our analyses using standard linear regressions for the 38 selected relations. The comparisons are displayed in Figs. 10 to 12. In all cases, a value >0 means that GLSME results are larger than SR ones (respectively <0 and lower values). In Fig. 10 we show the difference between the R 2 obtained with the GLSME algorithm (see Table 5) and the R 2 obtained with standard linear regression (denoted here by R 2 SR ). The differences are small, with a mean offset of 0.04 and a maximum value of 0.157. Therefore, both algorithms provide models explaining almost the same dependent variable variance with almost all R 2 > R 2 SR , that is, GLSME explains more variance of the dependent variable than the Standard Regression. In Fig. 11 we compare the relative accuracies coming from both algorithms. The differences are again small, with a mean difference of 0.80% and a maximum difference of 3.23%, with an outlier of -6.07% on the relation R logL. Therefore, both algorithms provide similar relative accuracies, especially when describing the radius. Finally, in Fig.   12 we compare the relative precisions. Here we find the largest differences, always in favor of the GLSME results, as expected. No clear trends can be identified at this   We have also tested using machine learning techniques to obtain the best-fitting regressions. Using the complete sample for training a Random Forest model (Ho, T. K 1995) we obtain an Out-Of-Bag (OOB) mean of squared residual of 0.0043 for estimating M and 0.003 for estimating R, and a percentage of the variance explained by the model of 85.58 % and 98.29 % for M and R respectively.
In Figs. 13 and 14 we show the relative importances of the independent variables in the RF regression model for the mass and radius respectively. "%IncMSE" is the increase in "MSE" (Mean Squared Error) of the OOB predictions as a result of variable j being permuted (values randomly shuffled).
The higher number, the more important the independent variable. On the other hand, IncNodePurity relates to the variables for which best splits can be chosen in terms of MSE function. More useful variables achieve higher increases in node purities, that is those where you can find a split which has a high inter-node 'variance' and a small intra-node 'variance'. In fact, both plots previde similar but complementary information. In Fig. 13 we can see that the three variables with the larger values (importance) for the estimation of the mass are L, T eff , and ρ. On the other hand, Fig. 14 is for the radius and the three variables with larger importance are ρ, logg and L. In both cases these three variables are somehow clustered and clearly different from the other two. Stellar metallicity is always the less important independent variable.
In addition, and to illustrate the application of this RF model for estimating masses and radii, we have trained an new Random Forest model using all the independent variables available on 70% of the MS stars in our sample, using the remaining 30% as the control group. This split into train and control groups is different from that used for the regressions in the previous sections. In the case of the regressions, the split into train and control groups depends on the uncertainties of the variables involved. In the case of this Random Forest model test, as uncertainties don't play any role, we directly split the complete sample randomly. The comparison of the estimated values and "Real" values for the mass and radius of the testing sample are shown in Fig. 15 (where "Real" means the values provided by the techniques described in Section 2, that is, Asteroseismology, Eclipsing binaries, and Interferometry). The implied accuracy is remarkable. Histograms with the residuals of these estimates are shown in Fig 16. The mean squared residuals of both distributions on the control group are 0.0036 and 0.0026 for M and R respectively, similar to those obtained for the RF model trained with the complete sampling, and the relative accuracies obtained (following the definition in Eq. 3) are 4.7% for the mass and 3.3% for the radius. The Random Forest model evidently provides a very efficient and accurate way for obtaining regression models to estimate the mass and/or the radius. The accuracies reached with this model are similar or better to those obtained with our GLSME models.

Comparison with other relations in the literature
We have compared our results with some of the most recent and popular relations in the literature,   dent variable and the mass as the independent variable, making it impossible to obtain a reliable comparison with our results. Torres et al. (2010) provided one relation for the stellar mass and another for the stellar radius, in the form f (X, X 2 , X 3 , log 2 g, log 3 g, [Fe/H]), where X = logT eff − 4.1. These relations are comparable to those we present, i.e., those in the form logM or logR T eff + logg + [Fe/H]. Using the control group of these relations to estimate the relative accuracy and precision obtained using the Torres' equations, we have reached, for the mass, an Acc of 7.37% and a Prec of 52.86%. Compared with the overall Acc of 7.54% and Prec of 3.43% in table 5, we find that both relations estimate the stellar mass with a good (and similar) accuracy but the precision in the Torres' formula is much deteriorated mainly due to the large number of dimensions. In the case of the radius, Torres' equations give an Acc of 3.64% and a Prec of 36.02%, to be compared with our overall Acc of 2.97% and Prec of 2.73%. Again, similar accuracies and very different precisions. Therefore, the main difference between Torres' relations and ours is the number of independent variables. The precision achieved, taking into account only the coefficient errors, is favorable to the expression with the lower number of dimensions. And in practice the final precision (when the uncertainties of the inputs are taken into account) gets worse when the numner of dimensions of the relations increases. That is, since Torres' relations involve six variables and ours only three, in terms of precision our relations are preferred for obtaining similar accuracies. Gafeira et al. (2012) provided three relations for the stellar mass. One is a function of logL, log 2 L and log 3 L, another adds [Fe/H], [Fe/H] 2 and [Fe/H] 3 to the previous relation, while a third one adds the stellar age to the second relation. This third relation is not really useful since the stellar age is not known, in general, with good precision (and the accuracy is as unknown). Therefore, we have compared the estimations of the two first relations with ours.
The first relation must be compared with our logM logL relation. Their relation, when compared to our control group, provides an Acc of 18.45% and a Prec of 12.90%. These values must be compared with our overall Acc of 10.80% and Prec of 0.13%. The second relation provides an Acc of 10.43% and a Prec of 9.87%. This must be compared with our relation logM logL + [Fe/H], which gives an overall Acc of 9.91% and Prec of 0.88%. The main differences can be understood by the fact that Gafeira's expressions, again, have a larger number of dimensions compared with ours, with the precision deterioration it implies, and they have obtained their relations using only 26 stars.
Finally, we have also compared the M = f (logL, log 2 L) and M = f (logT eff , log 2 T eff , log 3 T eff , log 4 T eff ) of Malkov (2007) with our logM logL and M T eff relations, respectively. The first relation of Malkov (2007) provides an Acc of 11.24%. This accuracy compares with our overall Acc of 8.29%. Malkov (2007) do not provide any errors for their coefficients, and as such we cannot estimate the relative precision of their expressions. The second relation gives an unexpectedly large Acc of 426.91% (compared to our Acc of 10.08%). We have tried to reproduce both of Malkov's relations with our data, and in the case of M = f (logL, log 2 L) we find similar coefficients, but in the case of M = f (logT eff , log 2 T eff , log 3 T eff , log 4 T eff ) we cannot reproduce their results.

EXOPLANET HOST STARS
Owing to the observational techniques that are used to discover exoplanets, their characterisation is linked to an accurate knowledge of the host star mass and/or radius. At present, only a comparatively small number of planet-hosting stars have been characterised by one of the three source techniques considered by us. Therefore, stellar masses and radii must be estimated sometimes using alternative methods.
To illustrate the impact of using our derived relations, we have applied them to a subset of our stellar sample that comprises 61 planet-hosting stars. In Table 5 we display two additional columns: "Acc. plan" and "Prec. plan", representing the relative accuracy and precision obtained using only stars harboring planets. As expected, these accuracies and precisions are similar to those obtained for the control group. to be used as independent variables to estimate M or R. We have used an error-in-variables regression algorithm (Generalized Least Squares with Measurement Error, GLSME) for a realistic estimation of the regression coefficient's uncertainties. For every combination, we have selected the subset of stars with the lowest uncertainties and applied the GLSME algorithm to them, using the remaining stars as controls. We have used the R 2 statistic and the relative accuracy and precision over different control groups to select the best relations over these 576 combinations.
We present a total of 38 new or revised relations, all of which have an R 2 > 0.85 (84% have R 2 > 0.9); a relative accuracy better than 10% (aside from three cases); and a relative precision better than 7.5% (aside from one case). In general, the addition of more dimensions to the relations improves R-squared and the Accuracy, and Precision, deteriorates. Expressions with 2 or 3 dimensions are those with a most compensated balance among R-Squared, Accuracy and Precision. In any case, the particular choosing of a certain relation must be evaluated at each particular case. A subsample of 61 stars in our sample that are planet hosts returns results having similar precision and accuracy to the bulk sample.
We have verified that the use of the standard linear regression provides similar results but with levels of returned precision worst in general than using and error-in-variables model. We have also compared the accuracy and precision obtained using our relations to those given by similar relations in the literature. The various relations provide very similar results, with sometimes better accuracies and precisions returned using our relations. Finally, we have trained a Random Forest model, which uses machine learning techniques, to estimate M and R. This model provides slightly better accuracies when all the variables are taken into account.
In the near future we will focus on the completion of the sampling where it has statistical weaknesses and on obtaining relations suitable for a physical interpretation in terms of their comparison with stellar structure and evolution theories and models.
In sum, this paper serves to provide a revision and extension of empirical relations for the estimation of stellar masses and radii. Finally, we have developed a R package for the estimation of stellar masses and radii using all the tools presented in this work.       Table 11 continued on next page

A. CROSS-CORRELATION BETWEEN THE INDEPENDENT VARIABLES
In section 3.1 we analyzed the cross-correlations between the independent variables of our study.
As a complement to Fig. 2, in Fig. 17 we show the scatter plot of the different pairs of variables.
Here we can verify the information provided by the Kendall -τ coefficient. In general we can see that most of the stars are located in a certain zone or line, something we can regard as a "Main-Sequence" behavior. In any case, all cross-correlations except g vs. ρ and L vs. ρ present a large dispersion, enough for regarding that each variable can provide independent and complementary information. L vs. ρ has a non-linear function-like behavior, with a large spread at the elbow. This spread allows the use of both variables at the same time, since both can provide some complementary information.
Finally, g and ρ are clearly correlated.