Why α s cannot be determined from hadronic processes without simultaneously determining the parton distributions

We show that any determination of the strong coupling α s from a process which depends on parton distributions, such as hadronic processes or deep-inelastic scattering, generally does not lead to a correct result unless the parton distributions (PDFs) are determined simultaneously along with α s . We establish the result by ﬁrst showing an explicit example, and then arguing that the example is representative of a generic situation which we explain using models for the shape of equal χ 2 contours in the joint space of α s and the PDF parameters.


The determination of α s in hadronic processes
The value of the strong coupling α s has been routinely determined from a variety of processes which involve hadrons in the initial state, both in electroproduction and hadroproduction. The current PDG average [1] includes two different classes of such determinations. One is from "DIS and PDF fits": in these determinations the value of α s is determined together with a set of parton distributions (PDFs) from a more or less wide set of data and processes, ranging from deepinelastic scattering (DIS) to hadron collider processes (such as Drell-Yan, top, and jet production).
The other is from single hadronic processes: specifically top pair production [2][3][4], and jet electroproduction [5]. Several more determinations of α s from one process have been presented, such as for instance jet production [6][7][8][9][10][11][12][13], multijets [7,[14][15][16][17][18][19][20][21] and W and Z production [22]. In these determinations, PDFs are taken from a pre-existing set, rather than being determined along with α s . The value of α s is then found by determining the likelihood of the new data as a function of α s -crudely speaking, by computing the χ 2 to the new data of the theoretical prediction which corresponds to a variety of values of α s , and determining the minimum of the parabola a e-mail: stefano.forte@mi.infn.it (corresponding author) (though in practice when various parametric uncertainties have to be properly kept into account the procedure is rather more elaborate, see e.g. Ref. [3]). The theoretical prediction is in turn obtained for each value of α s by combining the matrix element computed with the given α s value with the PDF set that corresponds to that α s value. This is of course necessary because PDFs strongly depend on α s , so a consistent calculation requires the use of PDFs corresponding to that value. All major PDF sets are available for a variety of α s values, and thus this poses no difficulty in practice.
Here we will show that this apparently straightforward and standard procedure may lead to an incorrect determination of α s , and we will argue that this is in fact a generic situation. The difference between this and the true best fit α s can be very substantial, and specifically much larger than the statistical accuracy of the α s determination: as we shall see, this in fact reflects a conceptual flaw in the procedure.
The reason for this can be understood by viewing the χ 2 as a simultaneous function of α s and the PDF parameters. Any given existing PDF set then traces a line in such space (the "best-fit line", henceforth): for each value of α s there is a set of best-fit PDF parameters, which corresponds to a point in PDF space. The standard procedure seeks for the minimum of the χ 2 in this subspace. This disregards the fact that the true minimum generally corresponds to a different point in (PDF, α s ) space, which also accommodates the new data [23].
One could naively argue that the standard procedure is correct, because what one is really doing is determining the best α s value for the new process subject to the constraint that PDFs describe well the (typically very large) set of data used to determine them. And surely -the naive argument goes -the minimum of α s anywhere other than on the bestfit line must correspond to a worse description of the world data? It actually turns out that this is incorrect: there exist points in (PDF, α s ) space for which the value of the χ 2 for the new process is lower than any value along the best-fit line, yet, somewhat counter-intuitively, the value of the χ 2 for the world data is also lower.
Moreover, the value of α s corresponding to these configurations may, and in general will, differ substantially from the one obtained using the standard procedure, and in particular it will be closer to the value obtained by simultaneously fitting α s and PDFs to a global dataset. Therefore, the standard procedure leads to a distorted answer, and it inflates artificially the dispersion of α s values obtained from different processes.
We establish this result by first providing an explicit example in which this happens. Namely, we consider the dataset used for the NNPDF3.1 [24] PDF determination. We then study the χ 2 for the subset of data corresponding to the Z transverse-momentum ( p t ) distribution, and determine the best-fit value of α s from this Z p T distribution along the best-fit line corresponding to the global fit dataset. We then exhibit a specific set of PDFs corresponding to a rather different value of α s , and such that the χ 2 is better both for the Z p T distribution, and for the rest of the dataset. This means that there exists at least one point in (PDF, α s ) space such the value of the χ 2 for the Z p T is better than any value along the best-fit line, and that there is no reason not to consider this as a better fit than the result at the best-fit α s along the best-fit line, because the agreement with the world data is also better than that at the minimum on the best-fit line.
We will understand the reason for this result by providing models for the shape of the χ 2 contours both for the world data and the new experiment in the joint (PDF, α s ) space. Specifically, we explain that this situation may arise both in the case in which the new data may provide an independent determination of α s and the PDFs of its own, and in the case in which the new data do not determine α s and the PDFs independently. This then covers the typical realistic scenarios in which the new data only constrain (or determine) a subset of PDFs: e.g. in the case of the Z p T distribution considered above, the gluon. In this latter, common case we will see that the value of α s obtained through the standard procedure leads to an artificially large dispersion of α s values: better-fit points in (PDF, α s ) generally lead to α s values which are closer to the global best fit.

An explicit example: the Z transverse momentum distribution
We provide an explicit example of the situation we described in the introduction. We consider the χ 2 values for both a global "world" dataset, and the dataset for a particular process P, as a function of α s . Given a fixed value of α s , the value of χ 2 also depends on the PDF set which is being used. As α s is varied, there is a PDF set which corresponds to the global best fit: this PDF set defines a line in (PDF, α s ) space which we call the best-fit line. We call χ 2 g (α s ) the value of the χ 2 for the global dataset, as a function of α s , along this best-fit line. We now consider the χ 2 for process P: We denote by χ r P 2 (α s ) the value of the χ 2 for process P as a function of α s , along this same best-fit line in (PDF, α s ) space. We call this the restricted χ 2 for process P. This means that this restricted χ r P 2 (α s ) is found using the value α s of the strong coupling, but the PDF set which corresponds to the global best fit. So χ 2 g (α s ) and χ r P 2 (α s ) are determined using the same α s and the same PDF set: that which corresponds to the global best fit. Note that, for any value of α s , this restricted χ r P 2 (α s ) is not in general the lowest χ 2 value for process P that can be found with the given value α s of the strong coupling -the PDFs are optimized for the global dataset, not for process P. This is unlike the global χ 2 g (α s ), in which (by definition) for each α s choice, the PDF set is always chosen as the corresponding global best-fit PDF set. Now, the standard procedure determines α s from process P as the minimum of χ r P 2 (α s ): namely, as the value of α s which minimizes χ r P , the restricted χ 2 for process P, evaluated along the best-fit line. We call this value of α s , determined using the standard procedure, α r 0 P : the restricted bestfit value of α s , and the corresponding PDF set the restricted best-fit PDF set for process P.
We now show that this restricted α r 0 P cannot be viewed as the value of α s determined by process P. We do this by exhibiting a point in (PDF, α s ) space which does not lie along the best-fit line, i.e. such that the PDFs do not correspond to the global best fit, such that α s = α r 0 P , and such that both the χ 2 for the individual dataset, and for the global dataset, are respectively better than the restricted χ r P 2 (α r 0 P ) and χ 2 g (α r 0 P ). This is thus a better fit to both process P and the global dataset than the restricted best fit, so there is no sense in which the restricted best-fit α r 0 P -which would be the "standard" answer -can be considered the α s value determined by process P.
Our construction is based on a previously published determination of α s by the NNPDF collaboration [25], in which the strong coupling is determined together with a set of parton distributions based on a global dataset which is very close to that used for the NNPDF3.1 [24]. This α s determination, which we now briefly summarize for completeness, builds upon the previous NNPDF methodology for PDF determination, in which PDFs are determined as a Monte Carlo set of PDF replicas, each of which is fitted to a replica of the underlying data. Note that, in this α s determination, the PDFs and α s are fitted simultaneously. This is unlike the case of previous determinations [26] in which PDFs were determined for a variety of α s values, and then the best fit was sought by looking at the likelihood profile of the best fit as a function of α s . Whereas the two methodologies lead (if correctly implemented) to the same best-fit α s value, simul-taneous minimization ensures a more accurate determination of the uncertainty involved, as explained in Ref. [25], essentially because it determines the likelihood contours in (PDF, α s ) space, rather than just the likelihood line corresponding to the best-fit PDF for each α s value.
The way this is accomplished in Ref. [25] within the NNPDF methodology is by fitting each data replica several times for a number of different values of α s , thereby providing a correlated ensemble of PDF replicas, in which to each data replica corresponds a PDF replica for each value of α s . Namely, for the kth data replica D (k) , a PDF replica is found by determining the set of PDF parameters θ (k) which minimize the χ 2 : where by argmin θ we mean that the minimization is performed with respect to θ for fixed D (k) and α s . It is then possible to compute the χ 2 for the kth data replica as α s is varied: We thus find an ensemble of parabolas χ 2(k) (α s ), one for each data replica. The best-fit α s for the kth data replica corresponds to the minimum along the kth parabola: In the NNPDF approach, the best-fit PDF value is the average of the PDF replica sample; similarly the best-fit α s is determined averaging the α (k) s values. We refer to Ref. [25] for further details, specifically on the dataset. Here we will use the NNLO PDF replicas determined in that reference as our baseline.
We can now consider any particular process P entering these global PDF determination, and ask ourselves what is the α s value corresponding to process P. The "standard" answer would be to simply consider the ensemble of best-fit PDFs determined in the global fit, and compute again χ 2(k) (α s ) but now only including process P in the computation of the χ 2 . We then get another set of parabolas where only the data D P for process P have been used. Note that these are restricted χ 2 parabolas, because the PDF parameters θ (k) (α s ), are those found in Eq. (1), by minimizing the global χ 2 . The minima now give an ensemble of restricted best-fit α s values for process P. Their average is then the restricted best fit for this process.
In Fig. 1 we show the parabolas corresponding both to the global fit (left) and to the Z p T distribution (right). The corresponding ensemble of values of α s is shown in Fig. 2. From these we find that the global best-fit value of α s (M Z ) is while the restricted best fit is, for the Z p T distribution, In both cases, the central value and uncertainty are respectively the mean and standard deviation computed over the replica sample, in the first cases for the global best fit Eq. (3) and in the latter case for the restricted best fit Eq. (5) for each replica.
We now show that the naive conclusion that the value Eq. (7) of α s is the value of the strong coupling determined by the Z p T distribution rests on shaky ground. To show it, we perform a new PDF determination in which the Z p T are now given a large weight in the χ 2 , and which is otherwise identical to the default determination. This PDF determination is performed for a single value of α s (M z ) = 0.120, a value intermediate between the restricted best-fit α r 0 Z p t Eq. (7) and the global best-fit α g 0 Eq. (6). Specifically the contribution of the Z p T data to the total χ 2 has been multiplied by a factor w = 32. This factor is chosen so that the contribution of the Z p T data is roughly equal to that of all the other data. The gluon and total quark singlet PDFs obtained in this way are compared in Fig. 3 to the default PDFs for the same value of α s (M Z ) = 0.120; χ 2 values for the global dataset are collected in Table 1, while in Table 2 χ 2 values for the Z p T data and the global dataset are compared. The gluon is shown because it is the PDF which is most affected by the Z p T data, and the singlet is also shown because it mixes with the gluon upon perturbative evolution.
The logic behind this procedure is that by giving more weight to this data we obtain a set of PDFs which provide a better fit to them: so we expect the value of χ 2 for the Z p T data to be better than that which would be obtained by taking the default best-fit PDF set for the same α s value. In fact it turns out that the value of the χ 2 thus obtained for the Z p T data is also better than the value χ r P P (0.124) which corresponds to the best fit along the global best-fit line (see Table 2). This means that the value α s (M Z ) = 0.120 is a better fit to the Z p T than the value Eq. (7) corresponding to the best fit along the best-fit line.
As discussed in the introduction one might object to the conclusion that α s (M Z ) = 0.120 might be a better α s from Z Fig. 1 The χ 2 profiles for each of the data replicas used for the NNLO determination of α s (m Z ) of Ref. [25]. Both the profiles for the total dataset (left), and for the Z p T distribution (right) are shown  and in a PDF determination in which the Z p T data receive a large weight (green, higher band at low x), shown as a ratio to the former p T : on the grounds that the PDF which we obtained thus are not compatible with the rest of the global dataset given that they do not correspond to the global best fit. However (see again Table 2) the value of χ 2 for the global dataset obtained using these PDFs is also better than the value of χ 2 g (0.124): hence with α s (M Z ) = 0.120 and these PDFs one gets a better fit to the Z p T data than with α s (M Z ) = 0.124, while also better fitting the world data. As it is clear from Fig. 3, the PDFs that best reproduce the Z p T data, though compatible within uncertainties with the global fit, differ from them by an amount which is sufficient to considerably improve the description of the Z p T data. Indeed, they lead to an improvement of their χ 2 value by almost 10% in comparison to that of the global fit with the same α s (M Z ) = 0.120 value, at the cost of only a small deterioration of the χ 2 of the global fit, by about 2%. Table 1 The values of χ 2 /N dat for the experiments included in the best global fit with α s = 0.120, compared to results obtained when α s = 0.124, or when the Z p T data are given a large weight and α s = 0.120. The number of datapoints is also given in each case. The full description of the datasets, including data selection, cuts, and references is given in Ref. [24] where the same data coding is used The conclusion that the restricted best-fit value α r 0 Z p t Eq. (7) is the value of the strong coupling determined by the Z p T distribution is thus difficult to defend: with α s (M Z ) = 0.120 we can fit better both the Z p T and the global dataset, provided the PDFs are suitably readjusted. It is perhaps worth stressing that the effect that we are demonstrating is large in comparison to uncertainties. Indeed, the global best fit Eq. (6) differs by almost four standard deviations from the restricted best fit Eq. (7) in units of the large uncertainty on the latter. Assuming the same uncertainty, the better-fit value α s (M Z ) = 0.120 would instead be compatible with the global best fit within uncertainties. This result is at first surprising, as one might expect that the best fit to the world data must be along the best-fit line. However, as we shall show shortly, it can be understood both at a qualitative, and also more quantitative level.
Note that the dataset for the global fit that we are considering actually does include the Z p T data of Table 2. Hence, the example presented here differs somewhat from a standard "real-life" situation such as in Refs. [3][4][5]: there, PDFs  Fig. 3, but now comparing the global fit (same as shown in Fig. 3) to a global fit from which the Z p T data have been removed, shown as a ratio to the former obtained from a fit to a global dataset are used for an α s determination from some new process which was not among those which were used to determine the PDFs. In practice, in our case, this makes essentially no difference because the inclusion of the Z p T data has almost no effect on the global fit, due to relatively small number of data (about a hundred vs. about 4000, see Table 2), and because the Z p t data are quite consistent with other data which determine the same PDFs (essentially the large x gluon) [27]. This is demonstrated explicitly in Fig. 4, where PDFs in the global fit with or without Z p T data are compared, and seen to be essentially identical. Also, χ 2 values for a global fit in which the Z p T data are not included are shown in Table 2, and are seen to be extremely close to those for the default global fit which includes this data: even the χ 2 for the Z p T data themselves are almost unchanged when fitting this data. We have checked that all χ 2 values for the other datasets of Table 1 change at or below the permille level upon exclusion of the Z p T data. As we will discuss in Sect. 2 below, whether or not the data for process P are included in the global fit or not also makes no difference of principle, though this is besides the point now, given the negligible impact of the Z p T data on the global fit. The reason why we choose to use for process P dataset which is part of the global dataset, is that it enables us to use the very large set of 8400 correlated replicas produced for Ref. [25] in order to construct the profiles shown in Fig. 1, thereby ensuring high statistical accuracy.
We conclude that we have presented an explicit example that shows how, using an existing PDF set to determine α s from a particular process P by looking for the minimum of the χ 2 for the process along the best-fit line of the global fit, can lead to a substantially distorted result. The reason is that there exist values of α s for which (for a suitable PDF configuration) the χ 2 for process P is lower than the minimum along the best-fit line, but, surprisingly, the χ 2 of the global dataset is also lower than the value it has at the minimum along the best-fit line.
This apparently puzzling result can be qualitatively understood by noting that the value of α s which optimizes the χ 2 of the chosen process is actually closer to the global minimum for α s than the value which corresponds to the minimum along the best-fit line. Due to having given large weight to some process, the χ 2 for the global dataset deteriorates somewhat, because it is now optimized for that process, rather than for the global dataset. But that deterioration is more than compensated by the fact that the α s value is now closer to the global minimum. This is a consequence of the fact that the PDF space is higher-dimensional (perhaps even infinitedimensional) so a small distortion of the PDFs is sufficient to accommodate the highly weighted process, and consequently the global χ 2 only increases by a small amount due Likelihood (χ 2 ) contours in (PDF, α s ) space for toy models in which a given process P is sufficient to determine PDFs; the parameter b (y axis) schematically represents the PDF parameters. The minimum of the global χ 2 g is the orange circle while the minimum of χ 2 P for process P is the green triangle. The line is the locus of the best-fit PDF ("best-fit line"): the stationary value Eq. (9) of b for the global χ 2 for fixed α s . The red square is the restricted best-fit α r 0 P : the value of α s corresponding to lowest restricted χ r P 2 , i.e. the point with lowest χ 2 P along the best-fit line. The ellipses are fixed χ 2 P and χ 2 g contours. The shaded area denotes the region in which both χ 2 g < χ 2 g (α r 0 P ) and χ 2 P < χ 2 P (α r 0 P ). The two plots correspond to two possible scenarios (see text) to the reweighting. In the next section we cast this qualitative argument in a more quantitative form.

The likelihood in (PDF, α s ) space
We now discuss some models for the dependence of the likelihood profiles on α s and the PDFs which explain the results which we found in the previous section, and show under which conditions the situation we encountered can be reproduced. Namely, we explicitly exhibit likelihood patterns for both a global dataset and a specific process P, such that there exist points in (PDF, α s ) space which have a higher likelihood (lower χ 2 ) than the restricted best fit -the point along the global best-fit line in (PDF, α s ) space which maximizes the likelihood for process P. As in the previous section, we refer to (minus) the log-likelihood for the global dataset as χ 2 g , and that for process P as χ 2 P . We assume that the global dataset determines simultaneously the PDFs and α s , so that χ 2 g has a single minimum value in (PDF, α s ) space, with fixed-χ 2 g ellipses about it. We then consider a particular subset of data, corresponding to a process P: the case of the Z p T data discussed in the previous section is an explicit example, but one may consider both wider datasets (e.g., all LHC data), or smaller datasets (e.g., one particular measurement of some cross-section performed by one experiment).
We further distinguish two broad classes of cases. The first, which is more common, is that process P does not fully determine the PDFs. This is the case of the Z p T data of the previous section, which constrain the gluon distribution in the medium-large x range but otherwise have a limited impact (see in particular Sect. 4.2 of Ref. [24]). In this case, likelihood contours for process P in (PDF, α s ) space have flat directions, along which PDFs and α s change but the value of χ 2 P does not. The second is that in which process P alone is sufficient to provide a determination of the PDFs, so that χ 2 P also has a minimum in (PDF, α s ) space, with fixed-χ 2 P ellipses about it. An explicit example of this would be if process P was the full set of deep-inelastic scattering data, which do determine fully the PDFs, albeit with larger uncertainties than a global dataset [28]. This case is relatively less common, but we discuss it first because the former case can be viewed as a spacial case of the latter.

Datasets which determine simultaneously α s and PDFs
In order to simplify the discussion, we consider a toy model in which the whole of PDF space is represented by a single parameter b so that (PDF, α s ) space is just the two-dimensional (b, α s ) plane. In a realistic situation, this can be viewed as a two-dimensional cross-section of the full space. In the vicinity of the minimum, where the χ 2 behaves quadratically, likelihood contours are just ellipses (see Fig. 5): where i = g, P according to whether one is considering the global dataset, or the dataset for process P. In our toy model we neglect the higher-order cubic and quartic terms that would arise far from the minimum. The point (b g 0 , α g 0 ) (denoted by an orange circle in Fig. 5) corresponds to the maximum likelihood for the global dataset, and the point (b P 0 , α P 0 ) for process P.
The best-fit line defined in Sect. 2 is the locus of points such that shown in Fig. 5 as a (blue solid) line. The condition Eq. (9) means that at each point along this line the tangent to the fixed-χ 2 g contour is vertical. Hence, the line is not a principal axis of the ellipse, unless the principal axes are along the b and α s directions. The restricted best-fit point is shown as a red square. This point, (b r , α r 0 P ), minimizes the restricted χ r P 2 along the best-fit line, so it is tangent to a fixed χ 2 P contour. This is the value of α s from process P that would be determined using the "standard" procedure. The value of χ 2 for process P at this point is the value discussed in Sect. 2: . The fixed χ 2 g and χ 2 P contours through the restricted bestfit point are also shown in figure. It is clear that, whenever they intersect, the whole area bounded by them (shown as shaded in the figure) has both χ 2 g < χ 2 g (b r , α r 0 P ) and χ 2 P < χ 2 P (b r , α r 0 P ). Any point in this region provides a better fit to both the global dataset and to process P. Whereas it is debatable which α s value in this region (if any) should be considered as the best-fit value of α s , it seems very difficult to argue that the restricted best-fit α r 0 P is the α s value preferred by process P, given that it gives a worse fit to the both process P, and the global dataset than any point in the highlighted region.
The two toy examples shown in Fig. 5 demonstrate different cases in which this may happen. Clearly, for some choices of parameters the value of the restricted best-fit α r 0 P might considerably differ from either of the values α P 0 or α g 0 that respectively minimize χ 2 P or χ 2 g . In fact, one can exhibit situations, such as shown in the right plot of Fig. 5, in which α P 0 ≈ α g 0 , yet the restricted best-fit α r 0 P is quite different. So not only does the restricted best fit provide a worse fit, but it cannot even be viewed as some kind of average or interpolation between the global value α g 0 and the process P value α P 0 . This demonstrates that taking α r 0 P as the value of α s determined by process P leads to an incorrect result.

Datasets which do not fully determine the PDFs
We now turn to the case in which process P does not fully determine the PDFs, so that there are flat directions for χ 2 P in (PDF, α s ) space. This means that, whereas the likelihood profile for the global dataset still has the form of Eq. (8), for process P there exists a hypersurface in (PDF, α s ) space (i.e. in our toy model a curve in the (b, α s ) plane) along which χ 2 P is at a minimum. This can be viewed as a limiting case of Eq. (8), when the fixed χ 2 P ellipses become infinitely thin, i.e., when either of σ P i goes to zero. Of course, just like far Fig. 6 Same as Fig. 5, but now for a toy model in which process P does not fully determine the PDFs. The minimum of the global χ 2 g is the orange circle while the minimum of the χ 2 P for process P is the dashed green line. The solid blue line is the best-fit line as in Fig. 5. The red square is the "standard" value α r 0 P : the value of α s corresponding to lowest restricted χ r P 2 , i.e. the point with lowest χ 2 P along the best-fit line. The ellipse is a fixed χ 2 g contour enough from the minimum the fixed-χ 2 profile will no longer be ellipsoids, the flat direction will only be locally straight. This situation is depicted in Fig. 6, where the minimum curve for χ 2 P is shown as a (dashed, green) straight line. In this case, in the generic situation in which this minimum curve and the best-fit line Eq. (9) intersect, the intersection point is the restricted best fit (b r , α r 0 P ), which would provide the "standard" α s determination.
However, it is clear that if one now considers the fixed χ 2 g contour through this point (shown as the ellipse in Fig. 6) in a generic case, i.e. unless the minimum curve (the dashed green curve of Fig. 6) is tangent to this ellipse, the contour intercepts a segment of the minimum curve, and any point along this segment provides a better fit to the global dataset than the restricted best-fit (b r , α r 0 P ). The minimum of the global χ 2 g along this segment is shown as a purple triangle in Fig. 6. Clearly, this is the point that is selected by minimizing the weighted in the limit of very large w. Indeed, in the limit in which w is very large so wχ 2 P χ 2 g the minimum of χ 2 w is along the line of degenerate minima of χ 2 P , but for any finite w the absolute minimum of χ 2 w is at the point at which χ 2 g is also minimal.
Arguably, the value of α s at this large-weight minimum can be viewed as the best-fit value α P 0 of α s as determined from process P, subject to the constraint of also fitting the global dataset. Be that as it may, the best-fit value of α s as determined from process P is surely not the restricted best-fit α r 0 P , which leads to a worse fit to the global dataset than any value of α s along the intercept segment. This is then representative of the case that we discussed in the Sect. 2. On the one hand, the value α r 0 P does not generically provide the best simultaneous fit of process P and the global dataset. Also, the value that minimizes the weighted χ 2 for large w -which provides a better fit to the global dataset while giving a fit of the same quality to process P -is generally closer to the global best-fit α g 0 , as it is clear from Fig. 6. Note that in this simple example, in which PDF space is one-dimensional, the large-w minimum leads to the same fit quality for process P as the restricted minimum. In a realistic situation both flat and non-flat directions will be present, and the weighting will also change the position of the minimum along the non-flat direction, thereby leading to a lower χ 2 for process P than the restricted minimum, as we observed in Sect. 2.
We conclude that the situation we encountered in Sect. 2 is generic. Whenever process P does not fully determine the PDFs, χ 2 P in (PDF, α s ) space has a subspace of degenerate minima. The value of α s obtained by minimizing the restricted χ r P 2 then leads to an incorrect result, generally further away from the global best-fit α g 0 than the value that would be obtained by looking for the minimum of the global χ 2 g in this subspace of degenerate minima of χ 2 P . It is important to note that this effect can be quite large, as it was the case in the explicit example of the previous section. In general, the size of the deviation of the infinite weight minimum from the restricted minimum will depend on the numerical values of the parameters that characterize χ 2 g and χ 2 P Eq. (8). Note however that whenever the restricted best fit differs considerably from the global best fit in units of the standard deviation of the global best fit, then the χ 2 g parabola will vary rapidly in the vicinity of the restricted best fit, and thus the infinite weight minimum will generically have a rather different value. This is the case of the example of Sect. 2, in which the restricted minimum Eq. (7) is eleven standard deviations away from the global minimum Eq. (6). It is interesting to observe that in the recent determination of α s [24] many of the restricted minima from individual datasets indeed differ considerably from the global minimum.
As a final observation, we note that the argument presented here, and thus its conclusion, are unaffected regardless of whether process P is or is not included in the global dataset. This has the interesting implication that in a global simultaneous determination of α s and the PDFs, such as performed in Ref. [25], the minimum of χ 2 from each dataset entering the global determination cannot be interpreted as the α s value corresponding to that dataset. Hence, there is no reason to expect that the global best-fit α s is the mean of the restricted best-fit values determined from each subset of the data entering the global fit.

The value of α s from a single process
The main conclusion of this paper is that it is generally not possible to reliably determine α s from a given physical process which depends on parton distributions while relying on a pre-existing PDF set. The reason can be simply stated: the existing PDF sets only sample a line in PDF space as α s is varied, hence, when using them, one is determining a constrained likelihood of the physical process under investigation along this line. This biases the results of the determination, in that the true maximum likelihood α s generally corresponds to a PDF configuration which is not along this line. The bias is especially severe since PDF space is high-dimensional. We have proven our point by showing that there exist PDFs which provide a better fit both to the given process, and the global dataset, and correspond to a different α s value. This has been shown both in an explicit example, and in toy models. Interestingly, when the physical process under investigation does not fully determine the PDFs, we have shown that this bias will generically pull the value of α s away from the best fit, in comparison to values of α s which provide a better fit to both the given process and the global dataset. Hence, determining α s from individual processes in this way, artificially inflates the dispersion of the α s values which are found.
It is important to stress that the problem that we are pointing out cannot be viewed as an extra source of PDF uncertainty in a determination which uses a pre-existing PDF set, but rather, it exposes a conceptual flaw. Indeed, the value of α s found by not fitting the PDF simultaneously does not correspond to a maximum likelihood point in (PDF, α s ) space, and as such it can differ from the true maximum likelihood point by an amount which is potentially large (as we have shown in explicit examples), and impossible to quantify without knowledge of the PDF dependence of the results.
One may then ask: what is the value of α s determined by process P? Does it exist at all? Clearly, in the case in which the dataset for process P is wide enough that it can be used to simultaneously determine both α s and the PDFs, it is this value of α s which must be interpreted as the value preferred by process P. In this case, the main import of our analysis is to show that minimizing along the line of global best-fit PDFs may lead to a value of α s which not only provides a poor fit to both process P and the global dataset, but cannot even be viewed as some kind of average of the value α P 0 from process P and the global value α g 0 ; rather, it will randomly differ from them in a way which depends on the χ 2 profiles in (PDF, α s ) space (see the right plot in Fig. 5).
On the other hand, it is very common that the process P is insufficient to simultaneously determine α s and the PDFs, and hence for χ 2 P to have a set of degenerate minima in (PDF, α s ) space. In this case it is debatable whether it makes sense to speak of a value of α s determined by process P. One may take the purist attitude that such value does not exist, or, alternatively consider defining the best fit value of α s as the result of the weighting procedure discussed in Sect. 3.2, i.e., as the best fit to the global dataset within the set of degenerate minima of the χ 2 P . In such case, the uncertainty on this α s value is determined by conventional one-σ contours of the global χ 2 in the degenerate subspace (i.e., in the example of Fig. 6, along the dashed green line).
The important observation in this case is that the value found minimizing along the best-fit line will generally be further away from the global best fit, while providing a worse fit to both process P and the global dataset. So in particular if one wishes to assess the spread of values of α s which are individually favored by each of the individual processes which enter in a global simultaneous determination of PDFs and α s (such as that of Ref. [25]) a realistic estimate is found by weighting each of the individual datasets in turn, while the spread of the restricted minima will suggest an artificially inflated dispersion of values.
The upshot of this whole discussion is that we do not envisage a shortcut: a determination of α s from a single process always requires a simultaneous determination of PDFs. In the simplest case, of a process (such as deep-inelastic scattering) which is sufficient to determine the PDFs, one must perform a simultaneous fit of the PDFs and α s to the dataset for that process. In the more common case of a process which does not fully determine the PDFs one may determine a value of α s for this process (if deemed interesting) through the weighting method discussed above, but this of course requires performing anyway a global PDF fit: so it is no easier than simply including process P in the dataset and repeating the global simultaneous determination of the PDFs and α s .
In this latter case, of performing a global fit of PDFs and α s , it might at least in principle be possible to include the new dataset, without refitting, by Bayesian reweighting [29,30]. Indeed, there is no difficulty of principle in reweighting correlated replicas: each replica will then correspond not only to a different set of PDFs, but also to a different α s value (that given by Eq. 3). The reweighted replica ensemble then also gives a posterior distribution of α s values. Whether and how the procedure would work when the new dataset is given a large weight is however not immediately clear. Also, whether this is feasible in practice of course remains to be seen: specifically, it might well be that in concrete cases an unrealistically large number of replicas in the prior set is necessary in order to get a reliable answer after reweighting.
Our results have two wider sets of implications. On the one hand, they provide a strong indication that looking at the χ 2 profile for any given process in the subspace of global fits as one parameter is varied can be very misleading. This is true not only for α s but for any parameter entering the global fit, including the parameters which govern the shape of the PDF themselves. Specifically, the dispersion of best-fit minima for individual processes as a feature of the PDF is varied -such as, say, the rate at which the gluon grows at small xdoes not appear to be a good proxy of the actual dispersion of the results favored by each processes. This may have some relevance in the benchmarking of parton distributions (see e.g. Refs. [31,32]).
On the other hand, they suggest caution in the determination of any standard model parameter from hadronic processes. Indeed, while the case of the determination of α s is particularly relevant because of the very strong correlation of α s and the PDFs, similar considerations apply to the simultaneous determination of any physical parameter in PDFdependent processes, such as the determination of the top quark mass [33], or of electroweak parameters, such as the W mass [34]. In the latter case, the correlation of PDFs and the parameter is in principle weaker than in the case of the strong coupling, but the very high accuracy which is sought suggest that currently available results, specifically in W mass determination, should be reconsidered with care.