Automated Determination of Bone Age in a Modern Chinese Population

Rationale and Objective. Large studies have previously been performed to set up a Chinese bone age reference, but it has been difficult to compare the maturation of Chinese children with populations elsewhere due to the potential variability between raters in different parts of the world. We re-analysed the radiographs from a large study of normal Chinese children using an automated bone age rating method to establish a Chinese bone age reference, and to compare the tempo of maturation in the Chinese with other populations. Materials and Methods. X-rays from 2883 boys and 3143 girls aged 2–20 years from five Chinese cities, taken in 2005, were evaluated using the BoneXpert automated method. Results. Chinese children reached full maturity at the same age as previously studied Asian children from Los Angeles, but 0.6 years earlier than Caucasian children in Los Angeles. The Greulich-Pyle bone age method was adapted to the Chinese population creating a new bone age scale BX-China05. The standard deviation between BX-China05 and chronologic age was 1.01 years in boys aged 8–14, and 1.08 years in girls aged 7–12. Conclusion. By eliminating rater variability, the automated method provides a reliable and efficient standard for bone age determination in China.


Introduction
Globalization is not a new trend in auxology. For decades, researchers have compared growth and maturation in populations across the world, exploring racial and regional differences and studying the effects of changes in lifestyle.
In particular, there has been a longstanding interest in the differences in tempo of maturation among populations and their secular trend. Such studies have preferentially been performed using bone age (BA) determinations. Unlike secondary sexual characteristics, BA can reveal the entire course of maturation from birth to adulthood. However, bone age rating is associated with considerable operator variability, and it is particularly difficult to ensure consistency between raters from widely separated parts of the world. e emergence of a fully automated and therefore raterindependent BA rating method [1] has the potential to overcome this problem and, thereby, revitalize comparative studies of maturation across the world.
We present a reanalysis of a large study of hand radiographs involving modern, normal Chinese children using this automated bone age system. e study could also have considerable impact on clinical practice in China. Due to the one-child policy, parents have a strong focus on growth and development of their child, and currently, there are more than 500.000 BA determinations in China per year. is need is growing, but there is a shortage of persons �uali�ed to do the rating. Adult height prediction in Asian children based on automated BA will be reported in a separate paper. Our study had the following objectives: (1) to validate the automated method by assessing its ability to analyse these images and its agreement with manual Tanner-Whitehouse (TW) ratings [2]; (2) to derive reference curves for Greulich-Pyle (GP) BA [3] of Chinese children, to study their regional variation in China, and to compare the curves to American children studied with the same bone age method; (3) to construct a Chinese BA scale (called BX-China05) as an adaptation of the GP BA for the Chinese population in 2005; (4) to construct a Chinese TW BA scale called TW-China05. children. e children were healthy and representative of the general population in these urban regions.

Subjects and Methods
e �lms were scanned using a Vidar Diagnostic Pro Advantage scanner (Hemdon, USA) in 150 dpi and 12 bits per pixel. e radiographs formed a representative sample of the previous study which comprised 17,000 radiographs recorded on �lms. ese radiographs had previously been BA rated by the �rst author using the TW method.

Analysis
Method. e images were analysed using the BoneXpert version 2.1 automated method for BA determination (Visiana, Denmark, http://www.BoneXpert.com), which determines GP and TW3 BA [5][6][7]. e method was developed on Caucasian children and has been validated on Asian children from Japan and California [8,9]. e TW BA rating, more speci�cally, the assignment of stages to each bone, was recently recalibrated in order to coincide with ratings from the First Zurich Longitudinal Study [10]. is rating, called standard TW rating, has been shown to be consistent with Tanner's ratings of the so-called Gold Series of reference images.
In order to compare bone maturation in different regions, reference curves of BA − CA versus chronological age (CA) were derived for each city separately and for all cites combined, using GP BA.
e GP method is the most widely known and used BA rating method. It is fast and reliable, because the image is rated by direct comparison to reference images. However, the GP BA scale refers to mid-to upper-class American Caucasian children from the 1930s, a standard group remote from present-day Chinese children. We therefore developed a variant of GP BA called BX-China05 as follows. e GP BA is considered to be not an estimate of CA, but rather an abstract maturity measure, and for each age, we determined the median GP BA of Chinese children. e relationship between CA and median GP BA was tabulated and used to transform an observed GP BA value into an age value, de�ned as BX-China05. For example, if the median GP BA is 13.5 at a CA of 13 years, a child with a GP BA of 13.5 has a BX-China05 of 13 years.
is method of mapping a maturity measure to a population-speci�c bone age is not new� it was originally conceived in the context of the TW method, and we also carried it through for the TW method in this study. erefore, for each age, we derived the median Sum Maturity Score (SMS) of the automated TW rating in order to derive a table for transforming an observed SMS value into a bone age, which we call TW-China05 (such a table had been estimated before using the manual TW ratings of these image data [4]. By reestimating the table using automated TW ratings, we ensured that the table was consistent with BoneXpert's standard TW rating of bone stages).

3.1.
Efficiency. irty images were rejected by BoneXpert before arriving at the 6026 images of this study. Most of these rejections were due to poor image quality or a bone age below the lower limit of the intended range of BoneXpert (2.5 years for boys and 2.0 years for girls), but 12 of the rejected images were of good quality, corresponding to 0.2%, so the observed efficiency of the method was 99.8%. Figure 1 shows the agreement between the manual TW3 rating and BoneXpert's TW3 rating, expressed as a Bland-Altman plot, that is, the difference is plotted versus TW3mid, the average of the two TW3 bone ages.

Agreement of BoneXpert with Human Raters.
Focusing on the TW3mid range of 2.5-16 years for boys and 2.5-14.5 years for girls, the average difference between automated and manual TW3 ratings was 0.00 years for boys and −0.37 years for girls. e root mean square (rms) deviations were 0.64 for boys and 0.68 years for girls in the same TW3mid range.

Reference Curves for Maturation.
To study the difference in tempo of maturation across various populations and ethnicities, we used the GP BA minus the age (CA). Figure  2 displays the observed BA − CA data, and the mean and ±2 SD curves are indicated. e SD for boys within the age range of 8-17 years was 1.26 years, and the SD for girls aged 7 to 15 years was 1.12 years (the SD curve was computed in 1-year intervals, and the quoted SD numbers are the averages over the entire age interval). Figure 3 shows the average curves for each of the �ve cities, and Figure 4 compares the BA − CA curves with data on four ethnicities from Los Angeles, USA.  Figure 5. It is seen that the average difference was close to zero for all ages as expected. e SD for boys over the age range of 8-14 years was 1.08 years, and for girls aged 7-12 years, the SD was 1.01 years.
3.5. e New TW-China05. Table 2 shows the relationship between BoneXpert's SMS and TW-China05, the new TW BA adapted to this population, and Figure 6 compares BoneXpert's TW-China05 BA and the manual TW-China05 BA. e SD of TW-China05 − CA for boys aged 8-14 years was 1.09, and for girls aged 7-12 years, it was 1.01 years.

Discussion
4.1. Efficiency. We found that BoneXpert was able to determine bone age for 99.8% of the good-quality images of Chinese children. is result compares well with previous results in Caucasian children, despite that this is a new ethnic group, and the similar performance is probably due to fact that the variation between the average Caucasian and Asian child is much smaller than the variation within the Caucasian population itself.

Agreement with Human
Raters. ere was good agreement between automated standard TW3 rating and manual TW3 as shown in Figure 1. Firstly, there was only a small average difference (bias) of 0.00 years in the boys and 0.37 years in the girls, which indicates that the Chinese TW rater was largely consistent with the European raters underlying the calibration of BoneXpert's standard TW rating. is agreement was in contrast to that reported in [10] where a Japanese rater had a bias of up to 1.19 years.
Secondly, the rms deviation (0.64 years for boys and 0.68 years for girls) was smaller than that observed for the Zurich raters [10]. One reason for this slightly better result may lie in the fact that the Chinese children were rated by a single rater while the Zurich images were rated by one of two raters.
e Chinese rater was never trained by authoritative raters on the TW Gold Series radiographs in London or on other series. He learned the rating from the book [2] but still achieved good agreement with the automated rating. Such an achievement has been reported once before for a Belgian reader [11]. Figure 2 shows BA − CA versus age. e most striking aspect is that, at the end of puberty, the average BA − CA has reached approximately 1 year. In other words, Chinese children reach maturity around one year earlier than the children that made up the GP BA reference, in agreement with previous studies [12]. is dramatic deviation was the main motivation for setting up the BX-China05 BA scale. Figure 2 also shows that the spread in maturity is rather large in Chinese children. is is partly due to the fact that the children were from several cities, separated by large distances. To illustrate this effect, Figure 2 shows the average BA − CA for the two cities at either end of the BA − CA spectrum, Dalian, a middle-size city in the North, and Shanghai, a large city in the south. Our pooling of all �ve cities contributed to a larger SD of BA − CA. Even within each city, there was a relatively large spread, presumably due to differences between social classes.  Figure 3 reveals the signi�cant differences between the �ve cities, with Shanghai being the most advanced in BA and Dalian the most delayed, and the other cities lie in between these two extremes. In principle, these data could be used to set up city-speci�c reference curves for BA − CA. However, we believe that this would be impractical, and one would be faced with ambiguities when a child moves from one city to another, or if the child is from a city not among the �ve investigated here. We prefer to pool all children and treat city differences as a statistical uncertainty. A limitation of this sample is that all data come from urban regions and middle and upper social classes, so the sample might not be representative for children from a rural community or children living under poor social conditions. Figure 4 compares the Chinese BA − CA curves with data on four ethnicities from Los Angeles, USA [9]. is comparison exploits the strength of the automated method. For the �rst time, one is able to compare maturity of Chinese children with children in the West using a method guaranteed to have no rater bias. e upper part of the �gure compares Los Angeles children of different ethnicities, showing that Asians and Hispanics in Los Angeles have similar maturation tempo, while the lower plot compares Asian children from Los Angeles and China. It is interesting that they follow almost the same pattern. Chinese boys lag slightly behind American Asian boys. We see that Asians, averaged over the two sexes, reach maturity 0.6 years before American Caucasians. is comparison of the populations with each other is more interesting than a comparison to the outdated GP standard. In [9], it was reported that American Caucasians reach maturity 0.4 years before Europeans, so we infer that Asians reach maturity a full year ahead of European Caucasians.  [4] of 16 and 15 years, respectively, as also illustrated in Figure  6.

Discussion of BX-China05 and TW-China05
. e BX-China05 and TW-China05 BA scales appear similar. Both are adapted to agree, on average, with the age in the Chinese population. However, there are important differences, because BX-China05 is based on the GP method whereas TW-China05 is based on the TW method.
GP and TW BA rating represent two rather different methods. When used for manual rating, the TW method has the advantage that it employs an explicit procedure that ensures that the same bones are used, with the same weights, by all raters. e GP method, on the other hand, has the advantage that it easier to learn and can be performed in a fast and intuitive manner. Furthermore, its reliance on a direct comparison of the image with the plates in the atlas ensures that raters are constantly "calibrated" to the correct level.
e TW method based on manual rating has been used extensively in China [4], rather than the GP method, because the latter refers to Pre-WW II Ohio children which are far removed from modern Chinese children. is paper has introduced BX-China05 for Chinese children, based on the GP method. e GP atlas is both a method and a BA standard. With BX-China05, one uses the GP method but escapes its BA standard by employing the corrections in Table 1.
With the introduction of automated methods, and the "recalibration" of GP BA in the shape of BX-China05, the balance between the TW and the GP methods changes in favour of the GP method, which has several advantages relative to the TW method.
(1) In the automated method used in this paper, the TW and GP methods are both based on the 13 RUS bones (Radius, Ulna, and 11 Short bones) chosen by Tanner for the TW method. e GP method assigns a uniform weight of 7.7% to all 13 bones, while the TW method places a larger weight of 20% on both radius and ulna. Any uncertainty in the interpretation of radius and ulna will, therefore, have a disproportionately large effect on the TW BA, and, indeed, the precision of the automated GP BA is considerably better (0.18 years SD) than that of the automated TW BA (0.30 years) [8].
(2) Adult height prediction has been found to be more accurate using GP compared to TW BA, primarily due to the large weight of the radius and ulna in the TW method. e BoneXpert method for adult height prediction is, therefore, based on GP BA (this is also true for the method for Chinese children, presented in a separate paper).
(3) e TW3 BA terminates at 16.5 and 15 years for boys and girls, respectively, while the GP BA terminates at 19 and 18 years. In other words, the TW3 method stops 2.5-3 years earlier. is also holds for the automated versions where TW3 and GP agree well up to the BA where TW3 saturates [10]. e same �ndings are noted in the Chinese version where TW-China05 stops 1.7 years before BX-China05. erefore, when rating the maturity of adolescents, the GP method is the more powerful method. Notice, however, that the automated method has the general limitation that it becomes more uncertain above a certain maturity level, which is 17 and 15 years of GP BA for boys and girls, respectively. is translates to the limits of 15.8 and 13.8 years, respectively, on the BX-China05 scale. Above these limits, there are only small maturityrelated changes in the appearance of the short bones.
In rare cases where the automated method fails, it is recommended to record a new X-ray, unless the child has abnormal bone structure. One can also perform a manual rating using the GP atlas and then use Table 1 to correct the GP BA to the BX-China05. It would even be acceptable to apply the manual TW-China05 method, in case the rater is more familiar with this method.

Conclusion
We have presented an important step toward the application of automated BA assessment to Asian populations around the world. We have found that Asian children in Los Angeles, and in mid-to large-sized cities in China, have approximately the same maturation rates.
Asians reach full maturity 1.2 years before the age indicated by the GP BA reference, and, thus, the use of this standard in China is considered misleading by some. On the other hand, as we have pointed out, there are many advantages of the GP atlas methodology, and we have presented an adaptation of the GP BA to Chinese children, called BX-China05. is is recommended for routine clinical use as well as in sports medicine.