Evaluation of the accuracy, consistency, and stability of measurements of the Planck constant used in the redefinition of the international system of units

The Consultative Committee for Mass and related quantities (ccm), of the International Committee for weights and measures (cipm), has recently declared the readiness of the community to support the redefinition of the international system of units (SI) at the next meeting of the General Conference on Weights and Measures (cgpm) scheduled for November, 2018. Such redefinition will replace the international prototype of the Kilogram (ipk), as the definition and sole primary realization of the unit of mass, with a definition involving the Planck constant, h. This redefinition in terms of a fundamental constant of nature will enable widespread primary realizations not only of the kilogram but also of its multiples and sub-multiples, best to address the full range of practical needs in the measurement of mass. We review and discuss the statistical models and statistical data reductions, uncertainty evaluations, and substantive arguments that support the verification of several technical preconditions for the redefinition that the ccm has established, and whose verification the ccm has affirmed. These conditions relate to the accuracy and mutual consistency of qualifying measurement results. We review also an issue that has surfaced only recently, concerning the convergence toward a stable value, of the historical values that the task group on fundamental constants of the committee on Data for Science and Technology codata-tgfc has recommended for h over the years, even though the ccm has not deemed this issue to be relevant. We conclude that no statistically significant trend can be substantiated for these recommended values, but note that cumulative consensus values that may be derived from the historical measurement results for h seem to have converged while continuing to exhibit fluctuations that are typical of a process in statistical control. Finally, we argue that the most recent consensus value derived from the best measurements available for h, obtained using either a Kibble balance or the xrcd method, is reliable and has uncertainty no larger than the uncertainties surrounding the current primary and secondary realizations of the unit of mass, hence that no credible technical impediments stand in the way of the redefinition of the unit of mass in terms of a fixed value of h.

The Consultative Committee for Mass and related quantities (ccm), of the International Committee for Weights and Measures (cipm), has recently declared the readiness of the community to support the redefinition of the international system of units (SI) at the next meeting of the General Conference on Weights and Measures (cgpm) scheduled for November, 2018.
Such redefinition will replace the International Prototype of the Kilogram (ipk), as the definition and sole primary realization of the unit of mass, with a definition involving the Planck constant, h. This redefinition in terms of a fundamental constant of nature will enable widespread primary realizations not only of the kilogram but also of its multiples and submultiples, best to address the full range of practical needs in the measurement of mass.
We review and discuss the statistical models and statistical data reductions, uncertainty evaluations, and substantive arguments that support the verification of several technical preconditions for the redefinition that the ccm has established, and whose verification the ccm has affirmed. These conditions relate to the accuracy and mutual consistency of qualifying measurement results.
We review also an issue that has surfaced only recently, concerning the convergence toward a stable value, of the historical values that the Task Group on Fundamental Constants of the committee on Data for Science and Technology has recommended for h over the years, even though the has not deemed this issue to be relevant. We conclude that no statistically significant trend can be substantiated for these recommended values, but note that cumulative consensus values that may be derived from the historical measurement results for h seem to have converged while continuing to exhibit fluctuations that are typical of a process in statistical control.
Finally, we argue that the most recent consensus value derived from the best measurements available for h, obtained using either a Kibble balance or the xrcd method, is reliable and has uncertainty no larger than the uncertainties surrounding the current primary and secondary realizations of the unit of mass, hence that no credible technical impediments stand in the way of the redefinition of the unit of mass in terms of a fixed value of h.

Evaluation of the accuracy, consistency, and stability of measurements of the Planck constant used in the redefinition of the international system of units
Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
In May of 2017, the Consultative Committee for Mass and related quantities (ccm) of the International Committee for Weights and Measures (cipm) met at the International Bureau of Weights and Measures (bipm) in Sèvres, France, and recommended 'that the cipm undertakes the necessary steps to proceed with the planned redefinition of the SI at the next meeting of the cgpm, acknowledging the measures to be taken by the ccm to ensure integrity and continuity in the dissemination of the kilogram'.
The ccm reached this conclusion based on an evaluation of whether several prerequisites, agreed upon previously, had been met by the metrological community represented in the ccm. These requirements included considerations of individual measurement accuracy, and homogeneity (or mutual consistency) of methodologically diverse, sufficiently accurate measurement results for the Planck constant, h. 'Homogeneity' means that the dispersion of the measured values is not excessive in light of the reported measurement uncertainties.
In addition, the ccm also considered an issue that has surfaced only recently, and that is not part of the preconditions for recommending a redefinition of the unit of mass: the convergence toward a stable value of the consensus values that the codata-tgfc has recommended over the years.
The question has also been discussed in several venues, of whether the uncertainty surrounding the estimate of h, at the time when its value will be fixed, will translate into an uncertainty in the realization of the unit of mass that is fit for the purposes of the vast and diverse applications where mass is measured.
This contribution reviews and discusses statistical models and methods for data reduction that support the aforementioned recommendation issued by the ccm, and that otherwise contribute to elucidate the related issues just mentioned.
In section 2, we present a historical retrospective of the measurements made of the Planck constant since 1979, and also consider the issue of whether there is any statistically significant temporal trend in the values of h that the codata-tgfc has recommended over the past 30 years.
In section 3, we evaluate the accuracy requirements defined by the ccm and identify the subset of published measurement results that satisfy these requirements. Section 4 examines the issue of mutual consistency of select subsets of the set of measurement results that meet the accuracy requirements, from both statistical and substantive viewpoints.
Finally, section 5 recapitulates the more salient findings, highlights the accomplishments that justify the redefinition of the kilogram in terms of the Planck constant, sketches anticipated research and development following the redefinition, and expresses unequivocal support for the recommendation from the ccm in favor of the much and long anticipated redefinition. Table 1 lists measurement results for h that either are listed in table 21 of [38], or that otherwise qualify for consideration by the codata-tgfc in relation with the special adjustment that will produce the values of the fundamental constants recommended for use in the redefinition of the SI. Figure 1 represents the historical, cumulative evolution of the consensus value for h derived from the results in table 1. For example, the (blue) dot corresponding to 1979 represents only the value measured by NPL-79. But the dot above 1980 represents the consensus value that blends the results from both NPL-79 and NIST-80. Similarly for all the other points. The last point on the right side of the plot, above the last instance of 2017, is the consensus value that was derived from all the data in that table.

Retrospective
The 2014 codata recommended value of h, and the associated uncertainty, both depicted in the same figure, serve only as a reference and basis for comparison. It is an interesting coincidence that the sequence of cumulative consensus values gravitates toward it, and indeed have been statistically indistinguishable from it since 2011 at least. The variability of the cumulative consensus values in the more recent past, say since 2015, is very much what should be expected for a process in statistical control [9].
The retrospective suggests readiness for fixing the value of h somewhere within the range of the uncertainty of the final, most current and comprehensive consensus value that the codata-tgfc will derive from measurement results that include or are closely related to those listed in table 1. The preliminary value, 6.626 070 150(69) × 10 3 4 J s [39], that the codata-tgfc will recommend for h as a result of the special adjustment of 2017 is depicted in figure 2.
The consensus values depicted in figure 1 could have been computed using any one of many, generally accepted methods available for consensus building. We have used the most common procedure in the area of application (meta-analysis in medicine) where most inter-comparisons are made, the socalled DerSimonian-Laird procedure [18]. The procedure is implemented in package metafor for the R environment for statistical computing and graphics [41,47], and is readily available in a web-based application with a user-friendly interface, the NIST Consensus Builder, which is available worldwide via any common Web browser, at https://consensus.nist. gov [28,29].
The statistical model underlying the DerSimonian-Laird procedure is an additive, random effects model, which represents each measured value as a sum of three elements, h j = h + λ j + ε j , for j = 1, . . . , n different experiments, where h denotes the measurand, the {λ j } denote experiment (or laboratory) effects, and the {ε j } represent experiment-specific measurement errors.
The {ε j } are regarded as realized values of independent random variables with mean 0 and possibly different standard deviations assumed equal to the standard uncertainties reported in the original publications. The {λ j } are modeled as values of independent random variables with mean 0 and standard deviation τ 0. These assumptions imply that, taken collectively, the measured values are centered at the true value of h.
The standard deviation τ of the experiment (or laboratory) effects {λ j } quantifies the 'excess' dispersion of the measured values, above and beyond what should be expected given the reported uncertainties. Because it remains 'invisible' until the measurement results are inter-compared, it is often called dark uncertainty [46].
An alternative model that is often used by the codata-tgfc, originally suggested by [7] for the combination of heterogeneous (or mutually inconsistent) measurement results, represents the measured values as h j = h + κε j , where now the true value of h is a common mean obtained after inflating the reported uncertainties by a multiplicative adjustment κ, typically greater than 1. [2,29] compare and discuss the additive and multiplicative models at considerable  [38], who list all the underlying references. The result for NRC-15, which that same table also lists, has meanwhile been superseded by NRC-17. The labels in the third column indicate whether the measurement was made using a Kibble (or watt) balance (k), a Joule balance (j), some other electrical method (e), or the x-ray silicon crystal density (xrcd) method (x).  [23] a Indicate the six experiments whose relative standard uncertainty is no larger than 5 × 10 8 . length. We will discuss the Birge adjustment further in section 4.2.

Thirty years of CODATA recommendations
The question has been raised of how the values that the codata-tgfc has been recommending for h over the years have evolved over time, and whether any trend such evolution might display could possibly challenge the readiness of the state-of-the-art to support the redefinition of the kilogram.
Neither was the absence of a trend one of the requirements originally specified by the ccm [22], nor was it a primary consideration issuing from the ccm meeting of May 2017. But since it was discussed during this meeting as a potential constraint on the date for the redefinition, we review it here taking into account the sources of uncertainty that affect any apparent trend.
It should be noted that values that the codata-tgfc has recommended for h in its periodic adjustments are not based on a consensus building exercise as we have undertaken above, but result instead from a least-squares adjustment that simultaneously involves all of the qualifying measurement results that are informative about all the fundamental constants (except the Newtonian constant of gravitation, G, which is treated separately), possibly after applying multiplicative adjustments to reported uncertainties that the codata-tgfc deems to be problematic (typically for appearing to be too small). This is the reason why the caption of table 21 of [38] explains that the values listed for h are inferred values obtained from the experimental data considered in the overall data reduction, and similarly for corresponding tabulations for other years. Figure 2 depicts the values that the codata-tgfc has recommended for h in the course of the past 30 years (selected from the values listed in table 2), the preliminary codata 2017 value for h [39] (open blue circle), a suggested 'trend' (red curve), and 95% (between the dotted lines) and 99% (pink region) coverage bands for what the true trend could possibly be.
The fact that the uncertainty bands can fully accommodate a horizontal line segment drawn across the whole range of dates represented on the horizontal axis, shows that the evidence is insufficient to reject the hypothesis of there being no trend. The relative value at 2017, of the slope (first derivative) of the red curve, equals 2 parts in 10 9 per year, with standard uncertainty 6 parts in 10 9 per year, hence does not differ significantly from 0.
The coverage bands take into account (a) the uncertainties associated with the codata recommended values, (b) the correlations between the values recommended in different releases, and (c) the uncertainties associated with these correlations.
The contribution from the correlations mentioned in (b) is very important, and any evaluation of the significance of any presumptive trend that ignores it, will be unrealistic. These correlations range from 0 to 0.73, and arise because different releases tend to share substantial proportions of the same experimental results. We have used (1) one instance of the Monte Carlo method to estimate the correlations mentioned in (b), and (2) another instance to propagate the contributions from (a)-(c) and to evaluate the uncertainty associated with the aforementioned trend. To estimate this correlation we applied perturbations to the measured values repeatedly, commensurate with the associated uncertainties, with the same perturbations being applied to the same measured values that the codata-tgfc used in adjustments for 2010 and 2014, and then estimated the correlation using all simulated pairs of adjusted values for these two years. (2) The second instance of the Monte Carlo method was used to propagate the contributions that (a)-(c) make to the uncertainty associated with the 'trend' depicted in figure 2. This 'trend' was defined by fitting a spline to the codata recommended values using a generalized additive model as implemented in R function gam, defined in package mgcv [49]. The Monte Carlo method used was a version of the so-called parametric statistical bootstrap [19], and involved two stages of sampling, as follows: (2.1) First, by sampling each correlation from the approximately Gaussian distribution of its Fisher's z-transform [20,21]. This recognizes and takes into account the fact that each of the correlations, although estimated based on many pairs of consensus values of h, in fact are based on a fairly small number of values shared by the adjusted values of h that correspond to two different years. (2.2) Second, by drawing from a multivariate Gaussian distribution whose covariance matrix reflects both the uncertainties associated with the values of h recommended in different years, and the correlations between them.
Since the correlation matrix produced in (2.1) need not be positive-definite, it was replaced in each iteration of the parametric bootstrap procedure by the 'closest' positive-definite correlation matrix computed using R function nearPD defined in the package Matrix [4,25].
The coverage band is based on 50 000 samples drawn from the multivariate distribution of the consensus values of h corre sponding to the six different years under consideration, and was computed using methods described by [16] and implemented in R function envelope defined in package boot [10].
The band has been constructed to reflect the uncertainty associated with the 'true' overall trend over the whole range of times represented along the horizontal axis, therefore at all times simultaneously, rather than at each instant in time without controlling coverage at other times. Such a band perforce is wider than the band intended to achieve only pointwise coverage, but it is the appropriate one to gauge the reality of the 'trend' as a whole. Even cursory examination of either version of this simultaneous coverage band (one for 95% confidence, the other for 99% confidence) reveals that no trend can properly be substantiated because either band can accommodate a horizontal line, signifying no trend.
The substantively relevant questions relating to the values the codata-tgfc has recommended over the years are about (i) the relative uncertainty that the codata-tgfc will use to qualify the value recommended for h in the special adjustment of 2017, and about (ii) whether this relative uncertainty warrants the redefinition in the sense that it guarantees seamless continuity in the primary realizations of the unit of mass before and after the redefinition, to within an uncertainty that is comparable to the uncertainties prevailing in the current, routine dissemination of the ipk [1].
The preliminary value of the answer to (i) is 10 parts in 10 9 [39], and the discussion in the following sections will only reinforce the view that (ii) should be answered in the affirmative.

Accuracy
ccm recommendation g1, of 2017, for a new definition of the kilogram in 2018, notes that the accuracy requirement has been met. The requirement specifies that 'at least three independent experiments, including work from Kibble balance and xrcd experiments, yield values of the Planck constant with relative standard uncertainties not larger than 5 × 10 8 ', and that 'at least one of these results should have a relative standard uncertainty not larger than 2 × 10 8 ' [14].
In fact there are now six experiments with relative standard uncertainties smaller than 5 × 10 8 , marked with a superscript Furthermore, the Pilot Study CCM.R-kg-P1 [43], which involved the bipm, lne (France), nist (USA), nmij (Japan), nrc (Canada), and ptb (Germany), provided a compelling demonstration of the mutual consistency that can be achieved by multiple primary realizations of the unit of mass. The weighted mean of the measurement results obtained by the participants agreed with the calibration based on the ipk to within 0.001 mg for one set of 1 kg standards used in the comparison, and to within 0.0045 mg for the other.

Cochran's test
The ccm recommendation includes the observation that the 'most recent measurement results with relative standard uncertainty below 5 × 10 8 do not pass the standard chi-squared test of consistency'. This test, which is commonly referred to as Cochran's Q test [11], is often used to detect whether measurement results are heterogeneous (or mutually inconsistent) in the sense that the measured values are more dispersed than their reported uncertainties would have led one to expect.
The statistic employed in this test is Q = n j=1 (h j − h) 2 /u 2 j , where h denotes the consensus value, h 1 , . . . , h n are the estimates of h produced in the n experiments, and u 1 , . . . , u n are the reported measurement uncertainties (which are interpreted as the standard deviations of the measurement errors {ε j } in the models discussed in section 2). In all cases where we use Cochran's test, h denotes the weighted average n j=1 w j h j / n j=1 w j , with w j = 1/u 2 j for j = 1, . . . , n. For the n = 6 measurement results marked with the superscript ( a ) in table 1, Q = 18.3. The reference probability distribution on the assumption of homogeneity is chi-squared with n − 1 = 5 degrees of freedom. Hence the test has p-value 0.003, which is the probability of observing a value of Q at least as large as the one that was observed, owing to the vagaries of sampling alone, when the results truly are homogeneous, or mutually consistent. Since this probability is quite small, the conventional conclusion is that the results are heterogeneous (or mutually inconsistent).
The test suffers from well-known, and potentially consequential limitations [26]. The more relevant limitations from the viewpoint of the present discussion are: first, when many measurement results are involved, the test may detect statistically significant heterogeneity that is substantively irrelevant [24]; second, the validity of the test hinges on the assumption that the {u j } are all based on infinitely many degrees of freedom, which cannot possibly be the case here. Therefore, the conclusion of the test must be regarded with circumspection, and ought not to be taken as definitive.
Alternative tests of mutual consistency, recently suggested by [33], lead to different conclusions. For example, the test based on Q r = n j=1 |h j − h|/u j , where h is the same weighted average specified above, when applied to the 6 measurement results aforementioned, has p-value 0.022. And if h is chosen as the weighted median instead, then the p-value increases to 0.18. Both p-values just quoted were obtained by application of a bootstrap procedure described by [33, section 3.3] and implemented in R function metahet defined in package altmeta [32].
For the purposes of the redefinition, however, much more important than the appearance of heterogeneity as judged by a statistical test of questionable validity, is the actual impact that the apparent heterogeneity, if indeed it were to be substantiated, might have upon the measurement of mass based on a consensus value of h derived from these measurement results.
To evaluate this impact, let us reason as follows: if the measurement results indeed are heterogeneous (or mutually inconsistent), and one computes the consensus value six times over, each time leaving out one of the results in turn, then the dispersion of the six leave-one-out estimates will express the substantive impact of the alleged heterogeneity.
Considering that, according to the OIML R111-1 standard [27], the maximum permissible error for a class E 1 standard of mass 1 kg is 0.5 mg, which translates into a relative standard uncertainty of ((0.5/3)/2) mg kg −1 = 83 parts in 10 9 [1], those 6 parts in 10 9 pale by comparison, thus suggesting that the statistically significant heterogeneity is substantively negligible.
Even the relative standard deviation of the six, qualifying measured values is less than 30 parts in 10 9 , which implies that they are exchangeable in practice, in the sense that any one of them could be selected to serve as the sole primary realization of the kilogram instead of a consensus value, up to this level of relative uncertainty.
This relative dispersion of 30 parts in 10 9 is also smaller than the 35 µg shift that was applied to the bipm 'as-maintained mass unit', following the Extraordinary Calibrations of 2013-2014 [42]. This calibration campaign, which involved comparing the ten working standards that the bipm uses routinely for the dissemination of the unit of mass, against the ipk, produced the surprising finding that, during the period 1992-2014, all bipm working standards had lost mass with respect to the ipk, by 35 µg on average [17].

Birge's adjustment
It is common practice for the codata-tgfc to apply a sufficiently large multiplicative correction, to the measurement uncertainties in results that are heterogeneous, to achieve consistency, based on the belief that the reported uncertainties generally underestimate the values that they should have [7]. The same goal can be achieved in the context of the additive random effects model mentioned in section 2, by introducing experiment or laboratory effects {λ j } with expected value 0 and standard deviation τ (dark uncertainty).
The conventional implementation of the multiplicative adjustment sets the adjustment factor κ equal to the so-called Birge ratio, which is Q/(n − 1) = 1.9 for the six, selected measurement results. The same metric has been 'rediscovered' in the field of meta-analysis in medicine, where it is called the H statistic [24].
First, it should be noted that the Birge ratio, being a function of the measured values, has an associated uncertainty, which qualifies the extent of the heterogeneity (or mutual inconsistency) of the measurement results. Higgins and Thompson describe eight different methods that may be used to evaluate this uncertainty, and [8] describe yet another. For the six measurement results marked with the superscript ( a ) in table 1, a Monte Carlo method implemented in the aforementioned R function metahet produces a 95% confidence interval for the Birge ratio ranging from 0.51 to 2.55. Since the interval includes 1, which is its expected value in the absence of heterogeneity, no correction appears to be warranted.
Second, a value of κ appreciably smaller than that 1.9 suffices to produce a value of the test statistic Q that is not statistically significant in a test with probability 0.05 of erroneously rejecting the hypothesis of homogeneity when the hypothesis is true. In this case, the value is κ = 1.285. Now, an increase by a mere 28.5% of the uncertainties reported in those six experiments, which makes their results mutually consistent as judged by Cochran's test, is much smaller than similar increases that the codata-tgfc has applied to produce recommended values for several of the fundamental constants in past data reductions. Furthermore, even after multiplying the reported uncertainties by 1.285, the accuracy criteria set by the ccm continue to be satisfied by the same set of six measurement results marked with the superscript ( a ) in table 1.
Interestingly, the maximum likelihood estimate of κ, for the multiplicative model h j = h + κε j , under the additional assumptions that the {ε j } are independent and Gaussian, is κ = 1.745, which lies between what the Birge ratio suggests and the minimal value for which Cochran's test no longer rejects the hypothesis of homogeneity.
Furthermore, application of the parametric statistical bootstrap to evaluate the uncertainty associated with κ produces a 95% confidence interval for it ranging from 0.65 to 2.50 [19]. Therefore, and similarly to what was noted above, since this interval includes 1, from the viewpoint of the multiplicative model there is no compelling statistical reason for expanding the reported uncertainties, and the six results effectively appear to be mutually consistent.

Homogeneous triplets
Considering the two accuracy requirements together that the ccm agreed should be met to warrant the redefinition of the kilogram in 2018-at least three measurements with relative standard uncertainties not larger than 5 × 10 8 , and at least one of these not larger than 2 × 10 8 -leads naturally to the question of whether any such triplets are homogeneous (or mutually consistent).
The set of six experiments with relative standard uncertainties not larger than 5 × 10 8 has 20 subsets of size 3. Of these, nine subsets are homogeneous and comprise at least one experiment whose relative standard uncertainty is no larger than 2 × 10 8 , at least one Kibble balance experiment, and at least one experiment using the x-ray silicon crystal density (xrcd) method. Homogeneity is taken to mean that the hypothesis of mutual consistency is not rejected in Cochran's Q test with probability 0.05 of erroneous rejection when the hypothesis is true. Table 3 lists the compositions of these nine subsets of size 3. Similarly, there are five homogeneous subsets of size 4 (out of 15) that meet the same requirements.

Conclusions
The level of agreement between the most accurate determinations of h available today is simply remarkable, especially considering that they involve multiple, different, and independent versions of two radically different approaches to its measurement.
Furthermore, both the Kibble balance and the xrcd method successfully provide an absolute realization of the unit of mass, a feat that has not been attempted since the national convention of the First Republic of France passed a decree on April 7, 1795, defining the gram as the 'absolute weight of a volume of pure water equal to the cube of the one hundredth part of a meter, at the temperature of melting ice'.
This initial definition of the unit of mass would have made primary realizations of it widely accessible in principle, notwithstanding the technical difficulties of implementing it in practice. However, as soon as a choice was made literally to cast the unit of mass into a fixed, physical artifact, first as the Kilogram of the Archives, later as the international prototype of the kilogram (ipk), the measurement of mass has involved only comparisons with standards linked to a single primary realization through a traceability chain, and no absolute determinations [40].
The investments that the national metrology institutes have committed to the realization of the unit of mass have finally paid off. The accuracy and methodological diversity requirements that the ccm had established as preconditions for the redefinition of the unit of mass, and for its realization in terms of a fundamental constant of nature, have been met.
The residual heterogeneity of the most accurate measurements of h is too small to be a cause for concern in practice. In fact, the relative uncertainty in the consensus value of h that may be derived from the six measurement results that currently satisfy the stringent accuracy requirements established by the ccm, and that is attributable to residual heterogeneity, amounts to 6 parts in 10 9 . This is practically negligible considering that the relative standard uncertainty allowed for the most reliable 1 kg mass standards recognized by the international organization of legal metrology (oiml) is 83 parts in 10 9 .
As an additional term of comparison, we point out that, in 2012, the bipm assigned standard uncertainty 0.007 mg (that is, 7 parts in 10 9 ) to the 1 kg mass prototype number 85, belonging to the United States of America (bipm Certificate No. 30, 2012, signed by M Kühne, Director). Other standards, calibrated at approximately the same time, have received either the same or closely similar uncertainty qualifications associated with their assigned masses.
Even if a single one of those six best determinations were to be randomly selected and assigned to h as its final and fixed value, the corresponding 'cost' that would be incurred in relative definitional uncertainty would be no more than 30 parts in 10 9 , which is still less than the steadily increasing, historical dispersion of the masses of the ipk, of its official copies, and of the national prototypes, relative to one another [15,40]. That uncertainty would also be smaller than the shift of 35 µ that was applied to the bipm 'as-maintained mass unit', following the Extraordinary Calibrations of 2013-2014.
The collective effort that enabled the current accomplishments is far ranging because it brings to within the reach of many laboratories, not just national metrology institutesin national and local government agencies, universities, and industry-the ability to realize the unit of mass. It will also facilitate and reduce the time and costs to disseminate this unit throughout what now effectively becomes a network of traceability chains anchored on multiple primary realizations, increasing the reliability of mass measurements all the way down to the ultimate consumer of such measurements.
Being a collective effort means that no participant has been left out or behind, and that all have added tangible value to the enterprise, regardless of how any particular measurement result may be situated relative to the consensus value that blends them all and indeed enables a reliable redefinition of the kilogram.
Neither does reaching this milestone imply that the research and development that has brought us to this point need not continue. Quite the contrary. Once the value of the Planck constant is fixed, the uncertainty surrounding the consensus value that will be assigned to h does not evaporate: instead, it should be converted into a common definitional, or realization uncertainty, that percolates to all instances of the realization. Continued research and development, and improvement of the measurement methods, most likely will reduce this uncertainty component going forward, hence will steadily improve performance in the measurement of mass.