Unbiased determination of polarized parton distributions and their uncertainties

We present a determination of a set of polarized parton distributions (PDFs) of the nucleon, at next-to-leading order, from a global set of longitudinally polarized deep-inelastic scattering data: NNPDFpol1.0. The determination is based on the NNPDF methodology: a Monte Carlo approach, with neural networks used as unbiased interpolants, previously applied to the determination of unpolarized parton distributions, and designed to provide a faithful and statistically sound representation of PDF uncertainties. We present our dataset, its statistical features, and its Monte Carlo representation. We summarize the technique used to solve the polarized evolution equations and its benchmarking, and the method used to compute physical observables. We review the NNPDF methodology for parametrization and fitting of neural networks, the algorithm used to determine the optimal fit, and its adaptation to the polarized case. We finally present our set of polarized parton distributions. We discuss its statistical properties, test for its stability upon various modifications of the fitting procedure, and compare it to other recent polarized parton sets, and in particular obtain predictions for polarized first moments of PDFs based on it. We find that the uncertainties on the gluon, and to a lesser extent the strange PDF, were substantially underestimated in previous determinations.


Introduction
The interest in the determination of polarized parton distributions (PDFs) of the nucleon is largely related to the experimental discovery in the late 80s that the singlet axial charge of the proton is anomalously small [1,2], soon followed by the theoretical realization [3,4] that the perturbative behavior of polarized PDFs deviates from parton model expectations, according to which gluons decouple in the asymptotic limit. The theoretical interpretation of these results has spawned a huge literature, while at the same time experimental information on polarized PDFs from deep-inelastic scattering but also from a variety of other processes has been accumulating over the years (see e.g. [5] and references therein).
First studies of the polarized structure of the nucleon were aimed at an accurate determination of polarized first moments (including detailed uncertainty estimates) [6][7][8], but did not attempt a determination of a full PDF set, which was first proposed in Ref. [9], but without uncertainty estimation. More recently, polarized PDF sets with uncertainties have been constructed by at least four groups (BB [10,11], AAC [12], LSS [13,14] and DSSV [15,68]). These PDF sets slightly differ in the choice of datasets, the form of PDF parametrization, and in several details of the QCD analysis (such as the treatment of higher twist corrections), but they are all based on the standard Hessian methodology for PDF fitting and uncertainty determination, which has been widely used in the unpolarized case (see [16,17] and references therein). This methodology is known [16] to run into difficulties especially when information is scarce, because of the intrinsic bias of the Hessian method based on a fixed parton parametrization. This is likely to be particularly the case for polarized PDFs, which rely on data both less abundant and less accurate than their unpolarized counterparts.
In order to overcome these difficulties, the NNPDF collaboration has proposed and developed a new methodology for PDF determination [18][19][20][21][22][23][24][25][26][27][28][29]. The NNPDF technique uses a robust set of statistical tools, which include Monte Carlo methods for error propagation, neural networks for PDF parametrization, and genetic algorithms for their training. The NNPDF sets are now routinely used by the Tevatron and LHC collaborations in their data analysis and for data-theory comparisons. In this work we extend the application of the NNPDF methodology to the determination of polarized parton distributions of the nucleon. As we will see, some PDF uncertainties will turn out to be underestimated in existing PDF determinations: in particular those of the polarized gluon distribution, but also those of the strange distribution.
The outline of this paper is as follows. In Sect. 2 we present the data set used to determine polarized PDFs, and we review the relationship between measured asymmetries and structure functions. In Sect. 3 we discuss the parametrization of polarized PDFs in terms of neural networks, and the construction of polarized structure functions. Then in Sect. 4 we discuss the minimization strategy. The results for the NNPDFpol1.0 polarized partons are presented in Sect. 5, and in Sect. 6 we discuss the phenomenological implications for the spin content of the proton and the test of the Bjorken sum rule. Finally in Sect. 7 we summarize our results and outline future developments. Some details on the benchmarking of polarized PDF evolution are given in the Appendix.

Experimental data
The bulk of the experimental information on (longitudinal) polarized proton structure comes from inclusive polarized deep-inelastic scattering with charged lepton beams. Deep-inelastic scattering with longitudinally polarized beams and targets allows a determination of the longitudinal structure function g 1 (x, Q 2 ), which in turn admits a factorized expression in terms of polarized PDFs. Neutral-current deep-inelastic scattering does not allow to us to disentangle the contribution of quarks and antiquarks.
Using both proton and neutron (deuteron or 3 He) targets it is possible to separate the isospin singlet and triplet quark contributions to structure functions, with the gluon determined from scaling violations. A weak control on the separation of the isospin singlet quark contribution into its SU(3) octet and singlet component is possible using baryon decays to fix the respective normalization of these contributions, with in principle their different scale dependence providing some constraint on their shape.
Only charged-current deep-inelastic scattering would allow for full flavor separation [30]: this could be feasible with neutrino beams (such as available at a neutrino factory [31]), or perhaps very highenergy polarized charged lepton beams (such as available at an electron-ion collider [32]). Therefore, current constraints on flavor separation are only provided by semi-inclusive deep-inelastic scattering data or by polarized hadron collider processes, such as polarized Drell-Yan production in fixed target collisions and polarized W production at the relativistic Heavy Ion Collider (RHIC). Likewise, direct constraints on the medium and large-x polarized gluon require hadron and jet production either in fixed target experiments or at RHIC, while the small-x gluon can only be probed by going to higher energy, such as at a polarized Electron-Ion Collider.
In this paper we will concentrate on inclusive longitudinally polarized DIS data, and thus we will only determine a subset of PDF combinations. This first polarized PDF set based on NNPDF methodology will then be available for inclusion of other datasets through the reweighting technique of Refs. [24,28].
We will first review the experimental observables which we use for the determination of polarized structure functions, and the information which various experiments provide on them. Then, we will summarize the features of the data we use, and finally the construction and validation of the Monte Carlo pseudodata sample from the input experimental data.

Experimental observables and longitudinal polarized structure functions
Standard perturbative factorization provides predictions for polarized structure functions g 1 (x, Q 2 ). However, experiments measure cross section asymmetries, defined by considering longitudinally polarized leptons scattering off a hadronic target, polarized either longitudinally or transversely with respect to the collision axis, from which the longitudinal (A ) and transverse (A ⊥ ) asymmetries are determined as The hadronic tensor for polarized, parity conserving deep-inelastic scattering can be parametrized in terms of four structure functions: two of them, F 1 (x, Q 2 ) and F 2 (x, Q 2 ), characterize spin-averaged deep-inelastic scattering, while g 1 (x, Q 2 ) and g 2 (x, Q 2 ) appear when both the lepton beam and the nucleon target are in definite polarization states. For the conventional definition of the hadronic tensor in terms of structure functions, see e.g. [33].
Here y is the standard lepton scaling variable, given by y = p · q p · k = Q 2 2xmE (10) in terms of the nucleon, lepton and virtual photon momenta, p, k and q, or, in the target rest frame, in terms of the energy E of the incoming lepton beam. The unpolarized structure function F 1 and unpolarized structure function ratio R which enter the definition Eq. (2-3) of the asymmetry may be expressed in terms of F 2 and F L by .
The longitudinal and transverse asymmetries are sometimes expressed in terms of the virtual photoabsorption asymmetries A 1 and A 2 according to where Recall that σ T 1/2 and σ T 3/2 are cross sections for the scattering of virtual transversely polarized photons (corresponding to longitudinal lepton polarization) with helicity of the photon-nucleon system equal to 1/2 and 3/2 respectively, and σ T L denotes the interference term between the transverse and longitudinal photon-nucleon amplitudes. In the limit m 2 ≪ Q 2 Eqs. (13) reduce to D = A /A 1 , d = A ⊥ /A 2 , thereby providing a physical interpretation of d and D as depolarization factors.
Using Eqs. (13) in Eqs. (2)(3) we may express the structure functions in terms of A 1 and A 2 instead: We are interested in the structure function g 1 (x, Q 2 ), whose moments are proportional to nucleon matrix elements of twist-two longitudinally polarized quark and gluon operators, and therefore can be expressed in terms of longitudinally polarized quark and gluon distributions. Using Eqs. (2)(3) we may obtain an expression of it in terms of the two asymmetries A , A ⊥ , or, using Eqs. (15)(16), in terms of the two asymmetries A 1 , A 2 . Clearly, up to corrections of O m Q , g 1 is fully determined by A , which coincides with A 1 up to O m Q terms, while g 2 is determined by A ⊥ or A 2 . It follows that, even though in principle a measurement of both asymmetries is necessary for the determination of g 1 , in practice most of the information comes from A or A 1 , with the other asymmetry only providing a relatively small correction unless Q 2 is very small.
It may thus be convenient to express g 1 in terms of A and g 2 : or, equivalently, in terms of A 1 and g 2 : It is then possible to use Eq. (17) or Eq. (18) to determine g 1 (x, Q 2 ) from a dedicated measurement of the longitudinal asymmetry, and an independent determination of g 2 (x, Q 2 ). In practice, experimental information on the transverse asymmetry and structure function g 2 is scarce [34][35][36]. However, the Wilson expansion for polarized DIS implies that the structure function g 2 can be written as the sum of a twist-two and a twist-three contribution [37]: The twist-two contribution to g 2 is simply related to g 1 . One finds which in Mellin space becomes It is important to note that g t3 2 is not suppressed by a power of m Q in comparison to g t2 2 , because in the polarized case the availability of the spin vector allows the construction of an extra scalar invariant. Nevertheless, experimental evidence suggests that g t3 2 is compatible with zero at low scale Q 2 ∼ m 2 . Fits to g t3 2 [38,39], as well as theoretical estimates of it [38,40] support the conclusion that which is known as the Wandzura-Wilczek [37] relation. We will thus determine g 1 , using Eq. (17) or Eq. (18), from an experimental determination of the longitudinal asymmetry, and using the approximate Wandzura-Wilczek form Eq. (22) of g 2 . In order to test the dependence of results on this approximation, we will also consider the opposite assumption that g 2 = 0 identically.

The dataset: observables, kinematic cuts, uncertainties and correlations
We use deep-inelastic lepton-nucleon scattering (DIS) data coming from all relevant experiments [2,34,35,[41][42][43][44][45][46][47][48] performed at CERN, SLAC and DESY. The experiments use different nucleon targets (protons, neutrons or deuterons). The main features of these data sets are summarized in Tab. 1, where we show, for each experiment, the number of available data points, the kinematic range covered by the experiment, and the quantity which is published and which we use for the extraction of g 1 . This quantity is not the same for all experiments: the primary observable can be one of the many asymmetries or structure functions discussed in Sect. 2.1, as we now summarize (individual experiments are labeled as in Tab. 1).
• EMC, SMC, SMClowx, COMPASS, HERMES97 All these experiments have performed a measurement of A . They have then determined A 1 from it using Eq. (13), under the assumption η ≈ 0. Therefore, what these experiments actually publish is a measurement of A D . We determine g 1 from A D using Eq. (17). This is possible because D is completely fixed by Eq. (6) in terms of the unpolarized structure function ratio Eq. (12) and of the kinematics. We determine the unpolarized structure function ratio using as primary inputs F 2 , for which we use the parametrization of Ref. [18,49], and F L , which we determine from its expression in terms of parton distributions, using the NNPDF2.1 NNLO parton set [26].

• HERMES
This experiment has performed a measurement of A , and it publishes both A and A 1 (which is determined using Eq. (13) and a parametrization of A 2 ). We use the published values of A , which are closer to the experimentally measured quantity, to determine g 1 through Eq. (17).

• E143
This experiment has taken data with three different beam energies, E 1 = 29.1 GeV, E 2 = 16.2 GeV, E 3 = 9.7 GeV. For the highest energy both A and A ⊥ are independently measured and A 1 is extracted from them using Eq. (13); for the two lowest energies only A is measured and A 1 is extracted from it using Eqs. (15)(16) while assuming the form Eq. (22) for g 2 . The values of A 1 obtained with the three beam energies are combined into a single determination of A 1 ; radiative corrections are applied at this combination stage. Because of this, we must use this combined value of A 1 , from which we then determine g 1 using Eq. (18). In order to determine y Eq. (10), which depends on the beam energy, we use the mean of the three energies.

• E154
This experiment measures A and A ⊥ independently, and then extracts a determination of A 1 . We use these values of A 1 to determine g 1 by means of Eq. (18).

• E155
This experiment only measures A , from which g 1 F 1 is extracted using Eq. (15) with the Wandzura-Wilczek form of g 2 Eq. (22). In this case, we use these values of g 1 F 1 , and we extract g 1 using Eq. (11) for F 1 , together with the parametrization of Ref. [18,49] for F 2 and the expression in terms of parton distributions and the NNPDF2.1 NNLO parton set [26] for F L , as in the other cases.
We have excluded from our analysis all data points with Q 2 ≤ Q 2 cut = 1 GeV 2 , since below such energy scale perturbative QCD cannot be considered reliable. A similar choice of cut was made in   Refs. [6-8, 11, 12, 15]. We further impose a cut on the squared invariant mass of the hadronic final state W 2 = Q 2 (1 − x)/x in order to remove points which may be affected by sizable higher-twist corrections. The cut is chosen based on a study presented in Ref. [50], where higher twist terms were added to the observables, with a coefficient fitted to the data, and it was shown that the higher twist contribution becomes compatible with zero if one imposes the cut W 2 ≥ W 2 cut = 6.25 GeV 2 . We will follow this choice, which excludes data points with large Bjorken-x at moderate values of the squared momentum transfer Q 2 , roughly corresponding to the bottom-right corner of the (x, Q 2 )-plane, see Fig. 1: in particular, it excludes all available JLAB data [51][52][53]. The number of data points surviving the kinematic cuts for each data set is given in parenthesis in Tab. 1.
As can be seen from the scatter plot in Fig. 1, the region of the (x, Q 2 )-plane where data are available after kinematic cuts is roughly restricted to 4 · 10 −3 x 0.6 and 1 GeV 2 ≤ Q 2 60 GeV 2 . In recent years, the coverage of the low-x region has been improved by a complementary set of SMC data [42] and by the more recent COMPASS data [45,46]. In the large-x region, information is provided at rather high Q 2 by the same COMPASS data and at lower energy by the latest HERMES measurements [48]. In comparison to the dataset used in Refs. [6][7][8] several new datasets are being used, in particular the SMC [42], HERMES [48] and COMPASS [45,46] data. The dataset used in this paper is the same as that of Ref. [11], and also the same as the DIS data of the fit of Ref. [15], which however has a wider data set which extends beyond inclusive DIS.
Each experimental collaboration provides uncertainties on the measured quantities listed in the next-to-last column of Tab. 1. Correlated systematics are only provided by EMC and E143, which give the values of the systematics due to the uncertainty in the beam and target polarizations, while all other experiments do not provide any information on the covariance matrix. For each experiment, we determine the uncorrelated uncertainty on g 1 by combining the uncertainty on the experimental observable with that of the unpolarized structure function using standard error propagation. We include all available correlated systematics. These are provided by the experimental collaboration as a percentage correction to g 1 (or, alternatively, to the asymmetry A 1 ): we apply the percentage uncertainty on g 1 to the structure function determined by us as discussed in Sect. 2.2 (which, of course, is very close to the value determined by the experimental collaboration).
We then construct a covariance matrix where p and q run over the experimental data points, i,p are the various sources of correlated uncertainty, and σ The correlation matrix is defined as where the total uncertainty σ (tot) p on the p-th data point is We show in Tab. 2 the average experimental uncertainties for each dataset, with uncertainties separated into statistical and correlated systematics. All values are given as absolute uncertainties and refer to the structure function g 1 , which has been reconstructed for each experiment as discussed above. As in the case of Tab. 1, we provide the values before and after kinematic cuts (if different).
In Tab. 1, we distinguish between experiments, defined as groups of data which cannot be correlated to each other, and datasets within a given experiment, which could in principle be correlated with each other, as they correspond to measurements of different observables in the same experiment, or measurements of the same observable in different years. Even though, in practice, only two experiments provide such correlated systematics (see Tab. 2), this distinction will be useful in the minimization strategy, see Sect. 4 below.

Monte-Carlo generation of the pseudo-data sample
Error propagation from experimental data to the fit is handled by a Monte Carlo sampling of the probability distribution defined by data. The statistical sample is obtained by generating N rep pseudodata replicas, according to a multigaussian distribution centered at the data points and with a covariance equal to that of the original data. Explicitly, given an experimental data point g (exp) 1,p ≡ g 1 (x p , Q 2 p ), we generate k = 1, . . . , N rep artificial points g (art),(k) 1,p according to   p are respectively the relative correlated systematic and statistical uncertainty. Unlike in the unpolarized case, Eq. (27) receives no contribution from normalization uncertainties, given that all polarized observables are obtained as cross section asymmetries.
The number of Monte Carlo replicas of the data is determined by requiring that the central values, uncertainties and correlations of the original experimental data can be reproduced to a given accuracy by taking averages, variances and covariances over the replica sample. A comparison between expectation values and variances of the Monte Carlo set and the corresponding input experimental values as a function of the number of replicas is shown in Fig. 2, where we display scatter-plots of the central values and errors for samples of N rep = 10, 100 and 1000 replicas. A more quantitative comparison can be performed by defining suitable statistical estimators (see, for example, Appendix B of Ref. [18]).
We show in Tabs. 3-4 the percentage error and the scatter correlation r (which is crudely speaking the correlation between the input value and the value computed from the replica sample) for central values and errors respectively . We do not compute values for correlations, as these, as seen in Tab. 2, are only available for a very small number of data points from two experiments. Note that some large values of the percentage uncertainty are due to the fact that g 1 for some experiments can take values which are very close to zero. It is clear from both the tables and the plots that a Monte Carlo sample of pseudo-data with N rep = 100 is sufficient to reproduce the mean values and the errors of experimental data to an accuracy which is better than 5%, while the improvement in going up to N rep = 1000 is moderate. Therefore, we will henceforth use a N rep = 100 replica sample as a default in the remainder of this paper.
3 From polarized PDFs to observables 3.1 Leading-twist factorization of the structure functions At leading twist, the polarized structure function g 1 for neutral-current virtual photon DIS is given in terms of the polarized quark and gluon distributions by   Here n f is the number of active flavors, the average charge is given by e 2 = n −1 f n f i=1 e 2 i in terms of the electric charge e i of the i-th quark flavor, ⊗ denotes the convolution with respect to x, and the nonsinglet and singlet quark distributions are defined as where ∆q i and ∆q i are the polarized quark and antiquark distributions of flavor i and ∆g is the polarized gluon PDF. In the parton model, Eq. (28) reduces to but in perturbative QCD the parton model expression is not recovered even when α s → 0 because at large Q 2 the first moment of the gluon distribution 1 0 dx ∆g ∼ (α s (Q 2 )) −1 , so the gluon does not decouple from g 1 asymptotically. Be that as it may, below charm threshold, with n f = 3, Eq. (28) can be rewritten as in terms of the singlet quark-antiquark distribution ∆Σ(x, Q 2 ), defined in Eq. (29), the isospin triplet combination and the SU(3) octet combination It is clear from Eqs. (28)(29)(30) that neutral current g 1 data only allow for a direct determination of the four polarized PDF combinations ∆g, ∆Σ, ∆T 3 and ∆T 8 . In principle, an intrinsic polarized component could also be present for each heavy flavour. However, we will neglect it here and assume that heavy quark PDFs are dynamically generated above threshold by (massless) Altarelli-Parisi evolution, in a zero-mass variable-flavor number (ZM-VFNS) scheme. In such a scheme all heavy quark mass effects are neglected. While they can be introduced for instance through the FONLL method [54], these effects have been shown to be relatively small already on the scale of present-day unpolarized PDF uncertainties, and thus are most likely negligible in the polarized case where uncertainties are rather larger.
The proton and neutron PDFs are related to each other by isospin, which we will assume to be exact, thus yielding ∆u p = ∆d n , ∆d p = ∆u n , ∆s p = ∆s n , and likewise for the polarized anti-quarks. In the following we will always assume that PDFs refer to the proton. The first moment of all non-singlet combinations of quark and antiquark distributions are scale-independent because of axial current conservation, while the first moment of the singlet quark distribution is not. Because of the axial anomaly, the first moment of the singlet quark distribution is scale-dependent in the MS scheme. However, it may be convenient to choose a factorization scheme in which the first moment of the singlet quark distribution is also scale independent so that all the individual quark and antiquark spin fractions are scale independent. Several such schemes, including the so-called Adler-Bardeen (AB) scheme, were discussed in Ref. [6], where the transformation connecting them to the MS scheme was constructed explicitly. By means of the SU(2) or SU(3) flavour symmetry it is possible to relate the first moments of the nonsinglet C-even combinations (∆T 3 and ∆T 8 ) to the baryon octet decay constants a 3 and a 8 : whose current experimental values are [55] A much larger uncertainty on the octet axial charge, up to about 30%, is found if SU(3) symmetry is violated [56]. Even though a detailed phenomenological analysis does not seem to support this conclusion [57], we will take as default this more conservative uncertainty estimation The impact of replacing this with the more aggressive determination given in Eq. (38) will be studied in Sect. 5.3.2. Structure functions will be computed in terms of polarized parton distributions using the so-called NNPDF FastKernel method, introduced in Ref. [23]. In short, in this method the PDFs at scale Q 2 are obtained by convoluting the parton distributions at the parametrization scale Q 2 0 with a set of Green's functions, which are in turn obtained by solving the QCD evolution equations in Mellin space. These Green's functions are then convoluted with coefficient functions, so that the structure function can be directly expressed in terms of the PDFs at the parametrization scale through suitable kernels K. In terms of the polarized PDFs at the input scale we have where the kernels K g1,∆Σ , K g1,∆g , K g1,+ take into account both the coefficient functions and Q 2 evolution. This way of expressing structure functions is amenable to numerical optimization, because all kernels can then be precomputed and stored, and convolutions may be reduced to matrix multiplications by projecting onto a set of suitable basis functions. The neutron polarized structure function g n 1 is given in terms of the proton and deuteron ones as with ω D = 0.05 the probability that the deuteron is found in a D state. Under the assumption of exact isospin symmetry, the expression of g n 1 in terms of parton densities is obtained from Eq. (40) by interchanging the up and down quark PDFs, which amounts to changing the sign of ∆T 3 .
The implementation of the polarized PDF evolution up to NLO has been benchmarked against the HOPPET evolution code [58] using the settings of the Les Houches PDF evolution benchmark tables [59]. This benchmarking is discussed in more detail in Appendix A. We will assume the values α s M 2 Z = 0.119 for the strong coupling constant and m c = 1.4 GeV and m b = 4.75 GeV for the charm and bottom quark masses respectively.

Target mass corrections to g 1
The leading twist expressions of structure functions given in Sect. 3.1 are corrected both by dynamical and kinematic higher-twist terms. The former are related to the contribution of higher twist operators to the Wilson expansion, and are generally expected to be small. The latter are related to target-mass corrections (TMCs), and because of their kinematical origin they can be included exactly: we do this following Ref. [60]. As discussed in Sect. 2.1, we thus consistently include all nucleon mass effects, both in the relation between measured asymmetries and structure functions, and in the relation between the latter and parton distributions.
The target mass corrections are especially simple in Mellin space, where they take the form [60] We denote byg 1,2 (N, Q 2 ) the Mellin space structure functions with TMCs included, while g 1,2 (N, Q 2 ) are the structure functions determined in the m = 0 limit. As discussed in Sect. 2.1, in the absence of precise data on the structure function g 2 , we will either determine it using the Wandzura-Wilczek approximation Eq. (22) (which is uncorrected by target-mass effects [60]), or, as a cross-check, simply setting it to zero. In either case, we may then determineg 1 Eq.(42) in terms of g 1 .
In the former (Wandzura-Wilczek) case, substituting Eq. (21) in Eq. (42) and taking the inverse Mellin transform, we get where we have shifted N → N − 2 in the term proportional to m 2 . Inverting the Mellin transform we then obtaiñ whencẽ The numerical implementation of Eqs. (45) or Eq. (47) is difficult, because of the presence of the first derivative of g 1 in the correction term. Therefore, we will include target mass effects in an iterative way: we start by performing a fit in which we set m = 0 and at each iteration the target mass corrected g 1 structure function is computed by means of Eqs. (45)(46)(47) using the g 1 obtained in the previous minimization step.

Neural networks and fitting strategy
We will now briefly review the NNPDF methodology for parton parametrization in terms of neural networks, and their optimization (fitting) through a genetic algorithm. The details of the procedure have been discussed in previous NNPDF papers, in particular Refs. [20,23,61]. Here we summarize the main steps of the whole strategy, and discuss in greater detail some points which are specific to the polarized case.

Neural network parametrization
Each of the independent polarized PDFs in the evolution basis introduced in Sect. 3.1, ∆Σ, ∆g, ∆T 3 and ∆T 8 , is parametrized using a multi-layer feed-forward neural network [27]. All neural networks have the same architecture, namely 2-5-3-1, which corresponds to 37 free parameters for each PDF, and thus a total of 148 free parameters. This is to be compared to about 10-15 free parameters for all other available determinations of polarized PDFs. This parametrization has been explicitly shown to be redundant in the unpolarized case, in that results are unchanged when a smaller neural network architecture is adopted: this ensures that results do not depend on the architecture [27]. Given that polarized data are much less abundant and affected by much larger uncertainties than unpolarized ones, this architecture is adequate also in the polarized case.
The neural network parametrization is supplemented with a preprocessing function. In principle, large enough neural networks can reproduce any functional form given sufficient training time. However, the training can be made more efficient by adding a preprocessing step, i.e. by multiplying the output of the neural networks by a fixed function. The neural network then only fits the deviation from this function, which improves the speed of the minimization procedure if the preprocessing function is suitably chosen. We thus write the input PDF basis in terms of preprocessing functions and neural networks NN ∆pdf as follows Of course, one should check that no bias is introduced in the choice of preprocessing functions. To this purpose, we first select a reasonable range of values for the large and small-x preprocessing exponents m and n, and produce a PDF determination by choosing for each replica a value of the exponents at random with uniform distribution within this range. We then determine effective exponents for each replica, defined as where ∆f = ∆Σ, ∆T 3 , ∆T 8 , ∆g. Finally, we check that the range of variation of the preprocessing exponents is wider than the range of effective exponents for each PDF. If it is not, we enlarge the range of variation of preprocessing, then repeat the PDF determination, and iterate until the condition is satisfied. This ensures that the range of effective large-and small-x exponents found in the fit is not biased, and in particular not restricted, by the range of preprocessing exponents. Our final values for the preprocessing exponents are summarized in Tab. 5, while the effective exponents obtained in our fit will be discussed in Sect. 5.5. It is apparent from Tab. 5 that the allowed range of preprocessing exponents is rather wider than in the unpolarized case, as a consequence of the limited amount of experimental information. It is enough to perform this check at the input evolution scale, Q 2 0 = 1 GeV 2 . Two of the PDFs in the parametrization basis Eq. (48), namely the nonsinglet triplet and octet ∆T 3 and ∆T 8 , are supplemented by a prefactor. This is because these PDFs must satisfy the sum rules Eqs. (35,36), which are enforced by letting The

Genetic algorithm minimization
As discussed at length in Ref. [20], minimization with a neural network parametrization of PDFs must be performed through an algorithm which explores the very wide functional space efficiently. This is done by means of a genetic algorithm, which is used to minimize a suitably defined figure of merit, namely the error function [20], Here g (art)(k) I is the value of the observable g I at the kinematical point I corresponding to the Monte Carlo replica k, and g (net)(k) I is the same observable computed from the neural network PDFs; the covariance matrix (cov) IJ is defined in Eq. (23).
The minimization procedure we adopt follows closely that of Ref. [19], to which we refer for a more general discussion. Minimization is perfomed by means of a genetic algorithm, which minimizes the figure of merit, Eq. (52) by creating, at each minimization step, a pool of new neural nets, obtained by randomly mutating the parameters of the starting set, and retaining the configuration which corresponds to the lowest value of the figure of merit.
The parameters which characterize the behaviour of the genetic algorithm are tuned in order to optimize the efficiency of the minimization procedure: here, we rely on previous experience of the development of unpolarized NNPDF sets. In particular, the algorithm is characterized by a mutation rate, which we take to decrease as a function of the number of iterations N ite of the algorithm according to [20] η so that in the early stages of the training large mutations are allowed, while they become less likely as one approaches the minimum. The starting mutation rates are chosen to be larger for PDFs which contain more information. We perform two mutations per PDF at each step, with the starting rates given in Tab. 6. The exponent r η has been introduced in order to optimally span the whole range of possible beneficial mutations and it is randomized between 0 and 1 at each iteration of the genetic algorithm, as in Ref. [23]. Furthermore, following Ref. [23], we let the number of new candidate solutions depend on the stage of the minimization. At earlier stages of the minimization, when the number of generations is smaller than N mut , we use a large population of mutants, N a mut ≫ 1, so a larger space of mutations is being explored. At later stages of the minimization, as the minimum is approached, a smaller number of mutations N b mut ≪ N a mut is used. The values of the parameters N mut gen , N a mut and N b mut are collected in Tab. 7.
Because the minimization procedure stops the fit to all experiments at once, we must make sure that the quality of the fit to different experiments is approximately the same. This is nontrivial, because of the variety of experiments and datasets included in the fit. Therefore, the figure of merit per datapoint for a given set is not necessarily a reliable indicator of the quality of the fit to that set, because some experiments may have systematically underestimated or overestimated uncertainties. Furthermore, unlike for unpolarized PDF fits, information on the experimental covariance matrix is only available for a small subset of experiments, so for most experiments statistical and systematic errors must be added in quadrature, thereby leading to an overestimate of uncertainties: this leads to a wide spread of values of the figure of merit, whose value depends on the size of the correlated uncertainties which are being treated as uncorrelated.
A methodology to deal with this situation was developed in Ref. [23]. The idea is to first determine the optimal value of the figure of merit for each experiment, i.e. a set of target values E targ i for each of the i experiments, then during the fit give more weight to experiments for which the figure of merit is further away from its target value, and stop training experiments which have already reached the target value. This is done by minimizing, instead of the figure of merit Eq. (52), the weighted figure of merit where E (k) j is the error function for the j-th dataset with N dat,j points, and the weights p (k) j are given by with n a free parameter which essentially determines the amount of weighting. In the unpolarized fits of Refs. [23,25,26,29] the value n = 2 was used. Here instead we will choose n = 3. This larger value, determined by trial and error, is justified by the wider spread of figures of merit in the polarized case, which in turn is related to the absence of correlated systematics for most experiments.
The target values E targ i are determined through an iterative procedure: they are set to one at first, then a very long fixed-length fit is run, and the values of E i are taken as targets for a new fit, which is performed until stopping (according to the criterion to be discussed in Sect. 4.3 below). The values of E i at the end of this fit are then taken as new targets until convergence is reached, usually after a couple iterations.
Weighted training stops after the first N wt gen generations, unless the total error function Eq. (52) is above some threshold E (k) ≥ E sw . If it is, weighted training continues until E (k) falls below the threshold value. Afterwards, the error function is just the unweighted error function Eq. (52) computed on experiments. This ensures that the figure of merit behaves smoothly in the last stages of training. The values for the parameters N wt gen and E sw are also given in Tab. 7.

Determination of the optimal fit
Because the neural network parametrization is very redundant, it may be able to fit not only the underlying behaviour of the PDFs, but also the statistical noise in the data. Therefore, the best fit does not necessarily coincide with the absolute minimum of the figure of merit Eq. (52). We thus determine the best fit, as in Refs. [19,20], using a cross-validation method [62]: for each replica, the data are randomly divided in two sets, training and validation, which include a fraction f tr of the data points respectively. The figure of merit Eq. (52) is then computed for both sets. The training figure of merit function is minimized through the genetic algorithm, while the validation figure of merit is monitored: when the latter starts increasing while the former still decreases the fit is stopped. This means that the fit is stopped as soon as the neural network is starting to learn the statistical fluctuations of the points, which are different in the training and validation sets, rather than the underlying law which they share.
In the unpolarized fits of Refs. [19,20,23,25,26,29] equal training and validation fractions were uniforlmly chosen, f (j) tr = f (j) val = 1/2. However, in this case we have to face the problem that the number of datapoints is quite small: most experiments include about ten datapoints (see Tab. 1). Hence, it is difficult to achieve a stable minimization if only half of them are actually used for minimization, as we have explicitly verified. Therefore, we have chosen to include 80% of the data in the training set, i.e. f In practice, in order to implement cross-validation we must determine a stopping criterion, namely, give conditions which must be satisfied in order for the minimization to stop. First, we require that the weighted training stage has been completed, i.e., that the genetic algorithm has been run for at least N wt gen minimization steps. Furthermore, we check that all experiments have reached a value of the figure of merit below a minimal threshold E thr . Note that because stopping can occur only after weighted training has been switched off, and this in turn only happens when the figure of merit falls below the value E sw , the total figure of merit must be below this value in order for stopping to be possible.  We then compute moving averages of the figure of merit Eq. (54) for either the training or the validation set at the l-th genetic minimzation step. The fit is then stopped if where The parameter N smear determines the width of the moving average; the parameter ∆ smear determines the distance between the two points along the minimization path which are compared in order to determine whether the figure of merit is increasing or decreasing; and the parameters δ tr , δ val are the threshold values for the decrease of the training and increase of the validation figure of merit to be deemed significant. The optimal value of these parameters should be chosen in such a way that the fit does not stop on a statistical fluctuation, yet it does stop before the fit starts overlearning (i.e. learning statistical fluctuation). As explained in Ref. [23], this is done studying the profiles of the error functions for individual dataset and for individual replicas. In order to avoid unacceptably long fits, training is stopped anyway when a maximum number of iterations N max gen is reached, even though the stopping conditions Eq. (56) are not satisfied. This leads to a small loss of accuracy of the corresponding fits: this is acceptable provided it only happens for a small enough fraction of replicas. If a fit stops at N max gen without the stopping criterion having been satisfied, we also check that the total figure of merit is below the value E sw at which weighted training is switched off. If it hasn't, we conclude that the specific fit has not converged, and we retrain the same replica, i.e., we perform a new fit to the same data starting with a different random seed. This only occurs in about one or two percent of cases.
The full set of parameters which determine the stopping criterion is given in Tab. 8. An example of how the stopping criterium works in practice is shown in Fig. 3. We display the moving averages Eq. (55) of the training and validation error functions E

Theoretical constraints
Polarized PDFs are only loosely constrained by data, which are scarce and not very accurate. Theoretical constraints are thus especially important in reducing the uncertainty on the PDFs. We consider in particular positivity and integrability.
Positivity of the individual cross sections which enter the polarized asymmetries Eq. (1) implies that, up to power-suppressed corrections, longitudinal polarized structure functions are bounded by their unpolarized counterparts, i.e.
At leading order, structure functions are proportional to parton distributions, so imposing Eq. (59) for any process (and a similar condition on an asymmetry which is sensitive to polarized gluons [63]), would imply for any pair of unpolarized and polarized PDFs f and ∆f , for all quark flavors and gluon i, for all x, and for all Q 2 . Beyond leading order, the condition Eq. (59) must still hold, but it does not necessarily imply Eq. (60). Rather, one should then impose at least a number of conditions of the form of Eq. (59) on physically measurable cross-sections which is equal to the number of independent polarized PDFs. For example, in principle one may require that the condition Eq. (59) is separately satisfied for each flavor, i.e. when only contributions from the i-th flavor are included in the polarized and unpolarized structure function: this corresponds to requiring positivity of semi-inclusive structure functions which could in principle be measured (and that fragmentation effects cancel in the ratio). A condition on the gluon can be obtained by imposing positivity of the polarized and unpolarized cross-sections for inclusive Higgs production in gluon-proton scattering [63], again measurable in principle if not in practice.
Because g 1 /F 1 ∼ x as x → 0 [64], the positivity bound Eq. (59) is only significant at large enough x 10 −2 . On the other hand, at very large x the NLO corrections to the LO positivity bound become negligible [63,65]. Therefore, the NLO positivity bound in practice only differs from its LO counterpart Eq. (60) in a small region 10 −2 ∼ x 0.3, and even there by an amount of rather less that 10% [63], which is negligible in comparison to the size of PDF uncertainties, as we shall see explicitly in Sec. 5.
Therefore, we will impose the leading-order positivity bound Eq. (60) on each flavor combination ∆q i + ∆q i and on the gluon ∆g (denoted as ∆f i below). We do this by requiring where σ i (x, Q 2 ) is the uncertainty on the corresponding unpolarized PDF combination f i (x, Q 2 ) at the kinematic point (x, Q 2 ). This choice is motivated by two considerations. First, it is clearly meaningless to impose positivity of the polarized PDF to an accuracy which is greater than that with which the unpolarized PDF has been determined. Second, because the unpolarized PDFs satisfy NLO positivity, they can become negative and thus they may have nodes. As a consequence, the LO bound Eq. (60) would imply that the polarized PDF must vanish at the same point, which would be clearly meaningless.
As in Ref. [23] positivity is imposed during the minimization procedure, thereby guaranteeing that the genetic algorithm only explores the subspace of acceptable physical solutions. This is done through a Lagrange multiplier λ pos , i.e. by computing the polarized PDF at N dat,pos fixed kinematic points (x p , Q 2 0 ) and then adding to the error function Eq. (52) a contribution This provides a penalty, proportional to the violation of positivity, which enforces Eq. (61) separately for all the non-zero quark-antiquark combinations. The values of the unpolarized PDF combination f j (x, Q 2 ) and its uncertainty σ j (x, Q 2 ) are computed using the NNPDF2.1 NNLO PDF set [25], while ∆f is the corresponding polarized PDF computed from the neural network parametrization for the k-th replica. The polarized and unpolarized PDFs are evaluated at N dat,pos = 20 points with x equally spaced in the interval x ∈ 10 −2 , 0.9 .
Positivity is imposed at the initial scale Q 2 0 = 1 GeV 2 since once positivity is enforced at low scales, it is automatically satisfied at larger scales [63,65]. After stopping, we finally test the positivity condition Eq. (61) is satisfied on a grid of N dat,pos = 40 points in the same intervals. Replicas for which positivity is violated in one or more points are discarded and retrained.
In the unpolarized case, in which positivity only played a minor role in constraining PDFs, a fixed value of the Lagrange multiplier λ pos was chosen. In the polarized case it turns out to be necessary to vary the Lagrange multiplier along the minimization. Specifically, we let This means that the Lagrange multiplier increases as the minimization proceeds, starting from λ pos = 1, at the first minimization step, N gen = 1, up to λ pos = λ max ≫ 1 when N gen = N λmax . After N λmax generations λ pos is then kept constant to λ max . The rationale behind this choice is that the genetic algorithm can thus learn experimental data and positivity at different stages of minimization. During the early stages, the contribution coming from the modified error function Eq. (62) is negligible, due to the moderate value of the Lagrange multiplier; hence, the genetic algorithm will mostly learn the basic shape of the PDF driven by experimental data. As soon as the minimization proceeds, the contribution coming from the Lagrange multiplier increases, thus ensuring the proper learning of positivity: at this stage, most of the replicas which will not fulfill the positivity bound will be discarded.
The final values of N λmax = 2000 and λ max = 10 have been determined as follows. First of all, we have performed a fit without any positivity constraint and we have observed that data were mostly learnt in about 2000 generations: hence we have taken this value for N λmax . Then we have tried different values for λ max until we managed to reproduce the same χ 2 obtained in the previous, positivity unconstrained, fit. This ensures that positivity is not learnt to the detriment of the global fit quality.
Notice that the value of λ max is rather small if compared to the analogous Lagrange multiplier used in the unpolarized case [25]. This depends on the fact that, in this latter case, positivity is learnt at the early stages of minimization, when the error function can be much larger than its asymptotic value: a large Lagrange multiplier is then needed to select the best replicas. Also, unpolarized PDFs are quite well constrained by data and positivity is almost automatically fulfilled, except in some restricted kinematic regions; only a few replicas violate positivity and need to be penalized. This means that the behaviour of the error function Eq. (52), which governs the fitting procedure, is essentially dominated by data instead of positivity.
In the polarized case, instead, positivity starts to be effectively implemented only after some minimizaton steps, when the error function has already decreased to a value of a few units. Furthermore, we have checked that, at this stage, most of replicas slightly violate the positivity condition Eq. (61): thus, a too large value of the Lagrange multiplier on the one hand would penalize replicas which are good in reproducing experimental data and only slightly worse in reproducing positivity; on the other, it would promote replicas which fulfill positivity but whose fit to data is quite bad. As a consequence of this behaviour, the convergence of the minimization algorithm would be harder to reach. We also verified that, using a value for the Lagrange multiplier up to λ pos = 100 leads to no significant improvement neither in the fulfillment of positivity requirement nor in the fit quality. We will show in detail the effects of the positivity bound Eq. (61) on the fitted replicas and on polarized PDFs in Sect. 5.
Finally, as already mentioned, we impose an integrability constraint. The requirement that polarized PDFs be integrable, i.e. that they have finite first moments, corresponds to the assumption that the nucleon matrix element of the axial current for the i-th flavor is finite. The integrability condition is imposed by computing at each minimization step the integral of each of the polarized PDFs in a given interval, with x 1 and x 2 chosen in the small x region, well below the data points, and verifying that in this region the growth of the integral as x 1 decreases for fixed x 2 is less than logarithmic. In practice, we test for the condition with x 1 < x ′ 1 . Mutations which do not satisfy the condition are rejected during the minimization procedure. In our default fit, we chose x 1 = 10 −5 , x ′ 1 = 2 · 10 −5 and x 2 = 10 −4 .

Results
We now present the main result of this paper, namely the first determination of a polarized PDF set based on the NNPDF methodology, NNPDFpol1.0. We will first illustrate the statistical features of our PDF fit, then compare the NNPDFpol1.0 PDFs to other recent polarized parton sets [11,12,14,15]. We will finally discuss the stability of our results upon the variation of several theoretical and methodological assumptions: the treatment of target-mass corrections, the use of sum rules to fix the triplet and octet axial charges, the implementation of positivity of PDFs, and preprocessing of neural networks and its impact on small and large x behaviour. We will not discuss here the way predictions for PDFs and uncertainties are obtained from NNPDF replica sets, for which we refer to general reviews, such as Ref. [66].

Statistical features
The statistical features of the NNPDFpol1.0 analysis are summarized in Tabs. 9-10, for the full dataset and for individual experiments and sets respectively. The error function E Eq. (52) shown in the 0.91 ± 0.12 Table 9: Statistical estimators for NNPDFpol1.0 with N rep = 100 replicas.
tables both for the total, training and validation datasets is the figure of merit for the quality of the fit of each PDF replica to the corresponding data replica. The quantity which is actually minimized during the neural network training is this figure of merit for the training set, supplemented by weighting in the early stages of training according to Eq. , i.e. the average of the observable over replicas, which provides our best prediction. The average number of iterations of the genetic algorithm at stopping, TL , is also given in this table.
The distribution of χ 2(k) , E tr , and training lengths among the N rep = 100 replicas are shown in Fig. 4 and Fig. 5 respectively. Note that the latter has a long tail which causes an accumulation of points at the maximum training length, N max gen . This means that there is a fraction of replicas that do not fulfill the stopping criterion. This may cause a loss in accuracy in outlier fits, which however make up fewer than 10% of the total sample.
The features of the fit can be summarized as follows: • The quality of the central fit, as measured by its χ 2 tot = 0.77, is good. However, this value should be taken with care in view of the fact that uncertainties for all experiments but two  are overestimated because the covariance matrix is not available and thus correlations between systematics cannot be properly accounted for. This explains the value lower than one for this quantity, which would be very unlikely if it had included correlations.
• The values of χ 2 tot and E differ by approximately one unit. This is due to the fact that replicas fluctuate within their uncertainty about the experimental data, which in turn are gaussianly distributed about a true value [49]: it shows that the neural net is correctly reproducing the underlying law thus being closer to the true value. This is confirmed by the fact that χ 2(k) is of order one.
• The distribution of χ 2 for different experiments (also shown as a histogram in Fig. 6) shows sizable differences, and indeed the standard deviation (shown as a dashed line in the plot) about the mean (shown as a solid line) is very large. This can be understood as a consequence of the lack of information on the covariance matrix: experiments where large correlated uncertainties are treated as uncorrelated will necessarily have a smaller value of the χ 2 .

Parton distributions
The NNPDFpol1.0 parton distributions, computed from a set of N rep = 100 replicas, are displayed in Fig. 7 at the input scale Q 2 0 = 1 GeV 2 , in the PDF parametrization basis Eq. (48) as a function of x both on a logarithmic and linear scale. In Figs. 8-9 the same PDFs are plotted in the flavour basis, and compared to other available NLO PDF sets: BB10 [11] and AAC08 [12] in Fig. 8, and DSSV08 [15]  in Fig. 9. We do not show a direct comparison to the LSS polarized PDFs [14] because there are no publicly available routines for the computation of PDF uncertainties for this set. Note that the dataset used for the BB10 determination contains purely DIS data, and that for AAC contains DIS supplemented by some high-p T RHIC pion production data: hence they are directly comparable to our PDF determination. The DSSV08 determination instead includes, on top of DIS data, polarized jet production data, and, more importantly, a large amount of semi-inclusive DIS data which in particular allow for flavour-antiflavour separation and a more direct handle on strangeness. All uncertainties in these plots correspond to the nominal 1-σ error bands.
The main conclusions of this comparison are the following: • The central values of the ∆u + ∆ū and the ∆d + ∆d are in reasonable agreement with those of other parton sets. The NNPDFpol1.0 results are in best agreement with DSSV08, in slightly worse agreement with AAC08, and in worst agreement with BB10. Uncertainties on these PDFs are generally slightly larger for NNPDF than for other sets, especially DSSV, which however is based on a much wider dataset. • The NNPDFpol1.0 determination of ∆s + ∆s is affected by a much larger uncertainty than BB10 and AAC08, for almost all values of x. The AAC08 and BB10 strange PDFs fall well within the NNPDFpol1.0 uncertainty band.
• The NNPDFpol1.0 determination of ∆s + ∆s is inconsistent at the two sigma level in the mediumsmall x ∼ 0.1 region with DSSV08, which is also rather more accurate, as one would expect as it includes semi-inclusive data (in particular for production of hadrons with strangeness). This suggests a tension between the inclusive analysis data and the semi-inclusive analysis.
• The gluon PDF is affected by a large uncertainty, rather larger than any other set, especially at small x. In particular, the NNPDFpol1.0 polarized gluon distribution is compatible with zero for all values of x.
• Uncertainties on the PDFs in the regions where no data are available tend to be larger than those of other sets. At very large values of x the PDF uncertainty band is largely determined by the positivity constraint.
Finally, in Fig. 10 we compare the structure function g 1 (x, Q 2 ) for proton, deuteron and neutron, computed using NNPDFpol1.0 (with its one-σ uncertainty band) to the experimental data included in the fit. Experimental data are grouped in bins of x with a logarithmic spacing, while the NNPDF prediction and its uncertainty are computed at the central value of each bin.   [11] and AAC08 [12] PDFs.   ) compared to a fit with m = 0 or with g 2 = 0.
The uncertainty band in the NNPDFpol1.0 result is typically smaller than the experimental errors, except at small-x where a much more restricted dataset is available; in that region, the uncertainties are comparable. Scaling violations of the polarized structure functions are clearly visible, especially for g p 1 , despite the limited range in Q 2 .

Stability of the results
Our results have been obtained with a number of theoretical and methodological assumptions, discussed in Sects. 3-4. We will now test their upon variation of these assumptions.

Target-mass corrections and g 2 .
We have consistently included in our determination of g 1 corrections suppressed by powers of the nucleon mass which are of kinematic origin. Thus in particular, as explained in Sec. 3.2, we have included target-mass corrections (TMCs) up to first order in m 2 /Q 2 . Furthermore, both TMCs and the relation between the measured asymmetries and the structure function g 1 involve contributions to the structure function g 2 proportional to powers of m 2 /Q 2 which we include according to Eq. (17) or Eq. (18) (see the discussion in Sect. 2.2). Our default PDF set is obtained assuming that g 2 is given by the Wandzura-Wilczek relation, Eq. (22). In order to assess the impact of these assumptions on our results, we have performed two more PDF determinations. In the first, we set m = 0 consistently everywhere, both in the extraction of the structure functions from the asymmetry data and in our computation of structure functions. This thus removes TMCs, and also contributions proportional to g 2 . In the second, we retain mass effects, but we assume g 2 = 0.
The statistical estimators for each of these three fits over the full dataset are shown in Tab. 11. Clearly, all fits are of comparable quality.
Furthermore, in Fig. 11 we compare the PDFs at the initial scale Q 2 0 determined in these fits to our default set: differences are hardly visible. This comparison can be made more quantitative by using the distance d(x, Q 2 ) between different fits, as defined in Appendix A of Ref. [23]. The distance is defined in such a way that if we compare two different samples of N rep replicas each extracted from the same distribution, then on average d = 1, while if the two samples are extracted from two distributions which differ by one standard deviation, then on average d = N rep (the difference being due to the fact that the standard deviation of the mean scales as 1/ N rep ).
The distances d(x, Q 2 ) between central values and uncertainties of the three fits of Tab. 11 are shown in Fig. 12. They never exceed d = 4, which means less than half a standard deviation for N rep = 100. It is interesting to observe that distances tend to be larger in the large-x region, where the expansion  Figure 10: The proton, neutron and deuteron polarized structure function g 1 (x, Q 2 ) as functions of Q 2 in different bins of x compared to experimental data. Experimental data are grouped in bins of x, while NNPDFpol1.0 results are given at the center of each bin, whose value is given next to each curve. In order to improve legibility, the values of g 1 (x, Q 2 ) have been shifted by the amount given next to each curve. in powers of m 2 /Q 2 is less accurate, and the effects of dynamical higher twists can become relevant. It is reassuring that even in this region the distances are reasonably small.
We conclude that inclusive DIS data, with our kinematic cuts, do not show sensitivity to finite nucleon mass effects, neither in terms of fit quality, nor in terms of the effect on PDFs.

Sum rules
Our default PDF fit is obtained by assuming that the triplet axial charge a 3 is fixed to its value extracted from β decay, Eq. (37), and that the octet axial charge a 8 is fixed to the value of a 8 determined from baryon octet decays, but with an inflated uncertainty in order to allow for SU(3) violation, Eq. (39). As discussed after Eq. (51) uncertainties on them are included by randomizing their values among replicas.
In order to test the impact of these assumptions, we have produced two more PDF determinations. In the first, we have not imposed the triplet sum rule Eq. (35), so in particular a 3 is free and determined by the data, instead of being fixed to the value Eq. (37). In the second, we have assumed that the uncertainty on a 8 is given by the much smaller value of Eq. (38).
The statistical estimators for the total dataset for each of these fits are shown in Tab. 12. Here too, there is no significant difference in fit quality between these fits and the default.
The distances between PDFs in the default and the free a 3 fits are displayed in Fig. 13. As one may expect, only the triplet is affected significantly: the central value is shifted by about d ∼ 5, i.e. about   1.73 ± 0.41 1.66 ± 0.53 E val ± σ E val 1.93 ± 0.58 1.87 ± 0.71 χ 2(k) ± σ χ 2 0.93 ± 0.12 0.92 ± 0.15 Table 12: The statistical estimators of Tab. 9, but for fits in which the triplet sum rule is not imposed (free a 3 ) or in which the octet sum rule is imposed with the smaller uncertainty Eq. (38). half-σ, in the region x ∼ 0.3, where x∆T 3 has a maximum, and also around x ∼ 0.01. The uncertainties on the PDFs are very similar in both cases for all PDFs, except ∆T 3 at small-x: in this case, removing the a 3 sum rule results in a moderate increase of the uncertainties; the effect of removing a 3 is otherwise negligible. The singlet and triplet PDFs for these two fits are compared in Fig. 14.
The distances between the default and the fit with the smaller uncertainty on a 8 , Eq. (38), are shown in Fig. 15. In this case, again as expected, the only effect is on the ∆T 8 uncertainty, which changes in the region 10 −2 x 10 −1 by up to d ∼ 6 (about half a standard deviation): if a more accurate value of a 8 is assumed, the determined ∆T 8 is correspondingly more accurate. Central values are unaffected. The singlet and octet PDFs for this fit are compared to the default in Fig. 16. We conclude that the size of the uncertainty on ∆T 8 has a moderate effect on our fit; on the other hand it is clear that if the octet sum rule were not imposed at all, the uncertainty on the octet and thus on strangeness would increase very significantly, as we have checked explicitly.
We conclude that our fit results are quite stable upon variations of our treatment of both the triplet and the octet sum rules.

Positivity
As discussed in Sect. 4, positivity of the individual cross sections entering the polarized asymmetries Eq. (1) has been imposed at leading order according to Eq. (61), using the NNPDF2.1 NNLO PDF set [25], separately for the lightest polarized quark PDF combinations ∆u + ∆ū, ∆d + ∆d, ∆s + ∆s and for the  polarized gluon PDF, by means of a Lagrange multiplier Eq. (62). After stopping, positivity is checked a posteriori and replicas which do not satisfy it are discarded and retrained.
In Fig. 17 we compare to the positivity bound for the up, down, strange PDF combinations and gluon PDF a set of N rep = 100 replicas obtained by enforcing positivity through a Lagrange multiplier, but before the final, a posteriori check. Almost all replicas satisfy the constraint, but at least one replica which clearly violates it for the s +s combination (and thus will be discarded) is seen.
In order to assess the effect of the positivity constraints we have performed a fit without imposing positivity. Because positivity significantly affects PDFs in the region where no data are available, and thus in particular their large x behaviour, preprocessing exponents for this PDF determination had to be determined again using the procedure described in Sect. 4.1. The values of the large x preprocessing exponents used in the fit without positivity are shown in Tab. 13. The small x exponents are the same as in the baseline fit, Tab. 5.
The corresponding estimators are shown in Tab. 14. Also in this case, we see no significant change in fit quality, with only a slight improvement in χ 2 tot when the constraint is removed. This shows that our PDF parametrization is flexible enough to easily accommodate positivity. On the other hand, clearly  The small x exponents are the same as in the baseline fit Tab. 5.
the positivity bound has a significant impact on PDFs, especially in the large x region, as shown in Fig. 18, where PDFs obtained from this fit are compared to the baseline. At small x, instead, the impact of positivity is moderate, because, as discussed in Sect. 4.4, g 1 /F 1 ∼ x as x → 0 [64] so there is no constraint in the limit. This in particular implies that there is no significant loss of accuracy in imposing the LO positivity bound, because in the small x 10 −2 region, where the LO and NLO positivity bounds differ significantly [65] the bound is not significant.

Small-and large-x behaviour and preprocessing
The asymptotic behavior of both polarized and unpolarized PDFs for x close to 0 or 1 is not controlled by perturbation theory, because powers of ln 1 x and ln(1 − x) respectively appear in the perturbative coefficients, thereby spoiling the reliability of the perturbative expansion close to the endpoints. Nonperturbative effects are also expected to set in eventually (see e.g. [64,67]). For this reason, our fitting procedure makes no assumptions on the large-and small-x behaviors of PDFs, apart from the positivity and integrability constraints discussed in the previous Section.
It is however necessary to check that no bias is introduced by the preprocessing. We do this following the iterative method described in Sect. 4.1. The outcome of the procedure is the set of exponents Eq. (48), listed in Tab. 5. The lack of bias with these choices is explicitly demonstrated in ∆q = ∆Σ, ∆g, ∆T 3 , ∆T 8 , for the default NNPDFpol1.0 N rep = 100 replica set, at Q 2 = Q 2 0 = 1 GeV 2 , and compare them to the ranges of Tab. 5. It is apparent that as the endpoints x = 0 and x = 1 are approached, the uncertainties on both the small-x and the large-x exponents lie well within the range of the preprocessing exponents for all PDFs, thus confirming that the latter do not introduce any bias.

Polarized nucleon structure
The NNPDFpol1.0 PDF set may be used for a determination of the first moments of polarized parton distributions. As briefly summarized in the introduction, these are the quantities of greatest physical interest in that they are directly related to the spin structure of the nucleon, and indeed their determination, in particular the determination of the first moments of the quark and gluon distributions, has been the main motivation for the experimental campaign of g 1 measurements. The determination of the isotriplet first moment, because of the Bjorken sum rule, provides a potentially accurate and unbiased handle on the strong coupling α s .

First moments
We have computed the first moments of each light polarized quark-antiquark and gluon distribution using a sample of N rep = 100 NNPDFpol1.0 PDF replicas. The histogram of the distribution of first moments over the replica sample at Q 2 0 = 1 GeV 2 are displayed in Fig. 20: they appear to be reasonably approximated by a Gaussian.
The central value and one-σ uncertainties of the quark first moments are listed in Tab. 15, while those of the singlet quark combination Eq. (29) and the gluon are given in Tab. 16. Results are compared to those from other parton sets, namely ABFR98 [8], DSSV10 [15], AAC08 [12], BB10 [11] and LSS10 [14]. Results from other PDF sets are not available for all combinations and scales, because public codes only allow for the computation of first moments in a limited x range, in particular down to a minimum value of x: hence we must rely on published values for the first moments. In particular, the DSSV and AAC results are shown at Q 2 0 = 1 GeV 2 , while the BB and LSS results are shown at Q 2 = 4 GeV 2 . For ease of reference, the NNPDF values for both scales are shown in Tab. 16 Table 15: First moments of the polarized quark distributions at Q 2 0 = 1 GeV 2 ; cv denotes the central value, whil exp and th denote uncertainties (see text) whose sum in quadrature is given by tot.
In order to compare the results for first moments shown in Tabs. 15-16, it should be understood that the uncertainties shown, and sometimes also the central values, have somewhat different meanings. In particular: • For NNPDFpol1.0 the exp uncertainty, determined as the standard deviation of the replica sample, is a pure PDF uncertainty: it includes the propagation of the experimental data uncertainties and the uncertainty due to the interpolation and extrapolation.
• In the ABFR98 study, the central values were obtained in the so-called AB factorization scheme [6]. While the gluon in this scheme coincides with the gluon in the MS scheme used here (and thus the value from Ref. [8] for the gluon is shown in Tab. 16), the quark singlet differs from it.  However, in Ref. [8] a value of the singlet axial charge a 0 in the limit of infinite Q 2 was also given. In the MS, the singlet axial charge and the first moment of ∆Σ coincide [6], hence we have determined ∆Σ for ABFR98 by evolving down to Q 2 = 1 GeV 2 the value of a 0 (∞) given in Ref. [8], at NLO and with α s (M z ) = 0.118 [69] (the impact of the α s uncertainty is negligible). We have checked that the same result is obtained if a 0 is computed as the appropriate linear combination of ∆Σ in the AB scheme and the first moment of ∆g. In the ABFR98 study, the exp uncertainty is the Hessian uncertainty on the best fit, and it thus includes the propagated data uncertainty. The th uncertainty includes the uncertainty originated by neglected higher orders (estimated by renormalization and factorization scale variations), higher twists, position of heavy quark thresholds, value of the strong coupling, violation of SU(3) (uncertainty on a 8 Eq. (36)), and finally uncertainties related to the choice of functional form, estimated by varying the functional form. This latter source of theoretical uncertainty corresponds to interpolation and extrapolation uncertainties which are included in the exp for NNPDF.
• For DSSV08 and BB10 PDFs, the central value is obtained by computing the first moment integral of the best-fit with a fixed functional form restricted to the data region, and then supplementing it with a contribution due to the extrapolation in the unmeasured (small x) region. The exp uncertainty in the  the measured region, and it thus includes the propagated data uncertainty. In both cases, we have determined the th uncertainty shown in the table as the difference between the full first moment quoted by DSSV08 or BB10, and the first moment in the measured region. It is thus the contribution from the extrapolation region, which we assume to be 100% uncertain. In both cases, we have computed the truncated first moment in the measured region using publicly available codes, and checked that it coincides with the values quoted by DSSV08 and BB10.
• For AAC08, the central value is obtained by computing the first moment integral of the best-fit with a fixed functional form, and the exp uncertainty is the Hessian uncertainty on it. However, AAC08 uses a so-called tolerance [70] criterion for the determination of Hessian uncertainties, which rescales the ∆χ 2 = 1 region by a suitable factor, in order to effectively keep into account also interpolation errors. Hence, the exp uncertainties include propagated data uncertainties, as well as uncertainties on the PDF shape.
• For LSS10, the central value is obtained by computing the first moment integral of the best-fit with a fixed functional form, and the exp uncertainty is the Hessian uncertainty on it. Hence it includes the propagated data uncertainty.
In all cases, the total uncertainty is computed as the sum in quadrature of the exp and th uncertainties. Roughly speaking, for LSS10 this includes only the data uncertainties; for DSSV08, and BB10 it also includes extrapolation uncertainties; for AAC08 interpolation uncertainties; for NNPDFpol1.0 both extrapolation and interpolation uncertainties; and for ABFR98 all of the above, but also theoretical (QCD) uncertainties. For LSS10 and AAC08, we quote the results obtained from two different fits, both assuming positive-or node-gluon PDF: their spread gives a feeling for the missing uncertainty due to the choice of functional form. Note that the AAC08 results correspond to their Set B which includes, besides DIS data, also RHIC π 0 production data; the DSSV08 fit also includes, on top of these, RHIC jet data and semi-inclusive DIS data; LSS10 includes, beside DIS, also semi-inclusive DIS data. All other sets are based on DIS data only.
Coming now to a comparison of results, we see that for the singlet first moment ∆Σ the NNPDFpol1.0 result is consistent within uncertainties with that of other groups. The uncertainty on the NNPDFpol1.0 result is comparable (if somewhat larger) to that found whenever the extrapolation uncertainty has been included. For individual quark flavors (Tab. 15) we find excellent agreement in the central values obtained between NNPDFpol1.0 and DSSV08; the NNPDF uncertainties are rather larger, but this could also be due to the fact that the DSSV08 dataset is sensitive to flavour separation.
For the gluon first moment ∆g , the NNPDFpol1.0 result is characterized by an uncertainty which is much larger than that of any other determination: a factor of three or four larger than ABFR98 and AAC08, ten times larger than BB10, and twenty times larger than DSSV08 and LSS10. It is compatible with zero within this large uncertainty. We have seen that for the quark singlet, the NNPDFpol1.0 uncertainty is similar to that of groups which include an estimate of extrapolation uncertainties. In order to assess the impact of the extrapolation uncertainty for the gluon, we have computed the gluon first truncated moment in the region x ∈ [10 −3 , 1]: to be compared with the result of Tab. 16, which is larger by almost a factor four. We must conclude that the experimental status of the gluon first moment is still completely uncertain, unless one is willing to make strong theoretical assumptions on the behaviour of the polarized gluon at small x, and that previous different conclusions were affected by a significant under-estimate of the impact of the bias in the choice of functional form, in the data and especially in the extrapolation region. Because of the large uncertainty related to the extrapolation region, only low x data can improve this situation, such as those which could be collected at a high energy electron-ion collider [32,71].

The Bjorken sum rule
Perturbative factorization, expressed in this context by Eq. (28) for the structure function g 1 (x, Q 2 ), and the assumption of exact isospin symmetry, immediately lead to the so-called Bjorken sum rule (originally derived [72,73] using current algebra): where Γ p,n 1 (Q 2 ) ≡ 1 0 dx g p,n 1 (x, Q 2 ) , and ∆C NS (α s (Q 2 )) is the first moment of the non-singlet coefficient function, while a 3 is defined in Eq. (35). Because the first moment of the non-singlet coefficient function ∆C NS is known up to three loops [74] and isospin symmetry is expected to hold to high accuracy, the Bjorken sum rule Eq. (71) potentially provides a theoretically very accurate handle on the strong coupling constant: in principle, the truncated isotriplet first moment can be extracted from the data without any theoretical assumption. Given a measurement of Γ NS 1 Q 2 , 0 at one scale the strong coupling can then be extracted from Eq. (71) using the value of a 3 from β decays, while given a measurement of Γ NS 1 Q 2 , 0 at two scales both a 3 and the value of α s can be extracted simultaneously. In Ref. [75], a 3 and α s where simultaneously determined from a set of nonsinglet truncated moments (both the first and higher moments), by exploiting the scale dependence of the latter [76], with the result g A = 1.04 ± 0.13 and α s (M z ) = 0.126 +0.006 −0.014 , where the uncertainty is dominated by the data, interpolation and extrapolation, but also includes theoretical (QCD) uncertainties. In this reference, truncated moments were determined from a neural network interpolation of existing data, sufficient for a computation of moments at any scale. However, because the small x behaviour of the structure function is only weakly constrained by data, the x → 0 extrapolation was done by assuming a powerlike (Regge) behaviour [77].
The situation within NNPDFpol1.0 can be understood by exploiting the PDF determination in which a 3 is not fixed by the triplet sum rule, discussed in Sect. 5.3.2. Using the results of this determination, we find The uncertainty is about twice that of the determination of Ref. [75]. As mentioned, the latter was obtained from a neural network parametrization of the data with no theoretical assumptions, and based on a methodology which is quite close to that of the NNPDFpol1.0 PDF determination discussed here, the only difference being the assumption of Regge behaviour in order to perform the small x extrapolation. This strongly suggests that, as in the case of the gluon distribution discussed above, the uncertainty on the value Eq. (74) is dominated by the small x extrapolation.
To study this, in Fig. 21 we plot the value of the truncated Bjorken sum rule Γ NS 1 Q 2 , x min Eq. (73) as a function of the lower limit of integration x min at Q 2 0 = 1 GeV 2 , along with the asymptotic value Γ NS 1 1 GeV 2 , 0 = 0.16 ± 0.03 (75) which at NLO corresponds to the value of a 3 given by Eq. (74). As a consistency check, we also show the same plot for our baseline fit, in which a 3 is fixed by the sum rule to the value Eq. (37). It is clear that indeed the uncertainty is completely dominated by the small x extrapolation. This suggests that a determination of α s from the Bjorken sum rule is not competitive unless one is willing to make assumptions on the small x behaviour of the nonsinglet structure function in the unmeasured region. Indeed, it is clear that a determination based on NNPDFpol1.0 would be affected by an uncertainty which is necessarily larger than that found in Ref. [75], which is already not competitive. The fact that a determination of α s from the Bjorken sum rule is not competitive due to small x extrapolation ambiguities was already pointed out in Ref. [8], where values of a 3 and α s similar to those of Ref. [75] were obtained.

Conclusions and outlook
We have presented a first determination of polarized parton distributions based on the NNPDF methodology: NNPDFpol1.0. We have determined polarized PDFs from the most recent inclusive data on proton, deuteron and neutron deep-inelastic polarized asymmetries and structure functions. Our main result is that the uncertainty in the gluon distribution, and to a lesser extent the strange distribution, and in the small x extrapolation for all parton distributions, is rather larger than in previous polarized PDF determinations. Also, there seems to be some tension between strangeness determined in deep-inelastic scattering and using sem-inclusive data.
In particular, we find that the role of the gluon distribution in the spin structure of the nucleon is essentially unknown, as the first moment of the gluon distribution is compatible with zero, but with an uncertainty which is compatible with a very large positive or negative gluon spin fraction. Likewise, the contribution from the small x region to the Bjorken sum rule makes its use as a means to determine α s essentially impossible. Different conclusions can be reached only if one is willing to make strong theoretical assumptions on the small x behaviour of polarized PDFs.
Future experiments, in particular open charm and hadron production in fixed target experiments, [78,87] inclusive jet production [80,81] and W boson production [82][83][84] from the RHIC collider may improve the knowledge on individual polarized flavors and antiflavors and on the gluon distribution in the valence region. However, only a high-energy electron-ion collider [32,71] might provide information on polarized PDFs at small x and thus reduce the uncertainty on first moments in a significant way.
A Mathematica driver code is also available from the same source.

A Benchmarking of polarized PDF evolution
We have benchmarked our implementation of the evolution of polarized parton densities by crosschecking against the Les Houches polarized PDF evolution benchmark tables [85]. Note that in Ref. [85] the polarized sea PDFs are given incorrectly, and should be These tables were obtained from a comparison of the HOPPET [58] and PEGASUS [86] evolution codes, which are x−space and N −space codes respectively. In order to perform a meaningful comparison, we use the so-called iterated solution of the N −space evolution equations and use the same initial PDFs and running coupling as in [85]. The relative difference ǫ rel between our PDF evolution and the benchmark tables of Refs. [85] at NLO in the ZM-VFNS scheme are tabulated in Tab. 17 for various combinations of polarized PDFs: the accuracy of our code is O 10 −5 for all relevant values of x, which is the nominal accuracy of the agreement between HOPPET and PEGASUS. Therefore, we can conclude that the accuracy of the polarized PDF evolution in the FastKernel framework is satisfactory for precision phenomenology.