Pediatric reference intervals of free thyroxine and thyroid stimulating hormone in three different hospitals serbest tiroksin ve tiroid hormon pediyatrik referans

Objective: Using data retrieved from three different hospitals, we established indirect reference intervals of free thyroxine (FT4) and thyroid stimulating hormone (TSH) for the Centaur XP or the Immulite 2000 instruments, in separate reference limits at each subset. Methods: We categorized children into seven age groups: 4–7 days, 8–15 days, 16–23 days, 24–61 days, 3–6 months, 7–36 months and 4–6 years. After a Box-Cox transforma-tion was employed, we followed the Horn algorithm to eliminate the extreme values. Results: The remaining FT4 (11,230) and TSH (11,274) tests were statistically analyzed. We determined separate reference limits at each subset with their own 2.5 th and 97.5 th percentiles. The interaction effect of both hospital and age grouping on FT4 was meaningful, but there was no interaction effect on TSH. Conclusions: Pediatric FT4 and TSH test results should be interpreted via narrowed age groups, especially in the first 3 weeks of neonatal period. Our reference limits may be recommended in pediatric follow-ups, considering the conditions of prematurity, birth-weight or multiple births. Preanalytical and analytical variations related with complex molecular structure of FT4 should be taken into consideration to ensure the validity of the result.


Introduction
Reference intervals of thyroid function tests are known to be method-and population-dependent (i.e. due to differences in ethnicity, population or geographically derived covariates such as lifestyle, salt iodination, selenium status, nutrition or other unknown geographic covariates). Unfortunately, manufacturer's recommendations/ inserts are unsatisfactory in population based and ageadjusted cut-off levels for the methods used on venous samples, especially in diagnosis of congenital hypothyroidism (CH) [1].
The neonatal and the pubertal periods are examples of periods for which there is no known reference ranges related to thyroid hormones [2]. Ideally, direct reference intervals should be determined using blood samples obtained from a large cohort of healthy subjects [3]. However, due to ethical and practical considerations in children, indirect reference interval determination often relies on large hospital databases [4]; e.g. difficulties in acquiring parental consent to take blood samples and making technical trouble in blood sampling. In addition, symptoms of hypothyroidism are relatively nonspecific, diverse and often mild; many children with symptoms of attention-deficit disorder or hyperactivity, fatigue, obesity, constipation and hair loss have thyroid function tests ordered to rule out hypothyroidism or thyroid hormone resistance as the cause of these symptoms. So, defining the real 'reference individual' is another challenge.
In this study, selecting three close hospitals, wage gap, life-style, environmental and ethnic effects of preanalytical factors were all supposed to be minimalized. The aim of the study was basically to establish indirect reference intervals of serum FT4 and TSH levels in children younger than 7 years for the Centaur XP or the Immulite 2000 instruments. It was decided to determine separate reference limits at each subset in heterogeneous hospital and age categories. Moreover, we examined the impact of the age and the hospital, where the values retrieved, on FT4 and TSH (both clinically and methodologically) with the other variable controlled.

Materials and methods
Patients aged 4 days to 72 months made up our study population. Age was estimated in accordance with the difference between the 'birthday' and the 'test requisition' times in units of (postnatal) day or month, and patients were categorized according to the following age groups: 4-7 days, 8-15 days, 16-23 days, 24-61 days, 3-6 months, 7-36 months and 4-6 years.
Accumulated data of serum FT4 and TSH levels was retrieved from three different hospitals in the same region, Fatih -an administrative district in Istanbul, Turkey. Children for whom simultaneous FT4 and TSH testing requested from out-patient clinics were included our study. But, tests requested from departments of pediatric endocrinology and medical genetics were excluded. Patients with repeated test requisitions were also excluded, due to the likelihood of thyroid dysfunction. In all three hospitals, the total (between-day) coefficients of variation (CV) of each instrument were satisfactory for FT4 and TSH (third generation). Percents of exclusion of the patients and imprecision values of the instruments can be reached through the Supplementary data (exclusion and imprecision). This study has been cleared by the Bezmialem Vakif University Ethics Review Board for human studies.
We used 'pmol/L' as the unit in FT4 values, so we converted the data collected from Hospitals A and C from 'ng/dL' by multiplying by a factor of 12.9. The unit 'mU/L' was used in all TSH measurements.
The same kind of instrument (Advia Centaur XP, Siemens Healthcare Diagnostics Inc., Tarrytown, NY, USA) was used at both the Istanbul Research and Training Hospital (Hospital A, three instruments) and the Bezmialem Vakif University (Hospital B, two instruments). At the Suleymaniye Women's Disease and Children's Research and Training Hospital (Hospital C). The instrumentation differed (only one instrument, Immulite 2000, Siemens Healthcare Diagnostics Inc., Tarrytown, NY, USA).
Exploratory graphical presentation of the raw data clearly expressed the presence of extreme values and we observed with normal Q-Q plots that FT4 and TSH measurements did not constitute a Gaussian distribution. Furthermore, the Kolmogorof-Smirnov goodness-of-fit test confirmed this finding. Therefore, a Box-Cox transformation was employed before further analysis [5]. The power 'λ' was estimated for factored data according to age groups because the distributions were expected to differ among these groups. Accordingly, we followed the Horn algorithm to eliminate the extreme values [6]. The Horn algorithm is based on the computation of the lower and upper quartiles of the transformed data. The details of analysis can be reached through Supplementary data (exclusion and imprecision).
A univariate general linear model (GLM) was applied to detect the impact of age and measurement mean on FT4 and TSH with the other variable controlled. Thus, the effect of each factor represents its own discrimination ability, taking other effects into consideration. Although the GLM has three assumptions -normality, homogeneity and independence of errors -and requires the absence of outliers, analysis of variance (ANOVA) is said to be a fairly robust analysis. Homogeneity was tested according to Levene's test. Conducted according to the type III sum of squares with constant intercept effect, these analysis tests hypothesize the differences in sub-population (or marginal) means for ANOVA designs with unequal numbers of sub-populations. These least squares means are the best linear-unbiased estimates of the marginal means for the design [7]. Tests of differences in least squares means have the important property of being invariant to the choice of the coding of effects for categorical independent variables. Statistical analysis was performed by using Spss11 and MS Excel 2013.

Results
The percent of extreme values in three different hospitals (A, B and C) is shown in Table 1. Without extreme values, the remaining FT4 (11,230) and TSH (11,274) test results were statistically analyzed.
Age related changes in the 2.5 th , 5 th , 10 th , 25 th , 50 th , 75 th , 90 th , 95 th and 97.5 th percentiles of FT4 and TSH are shown graphically in Figures S2 and S3 via the Supplementary data (figures). The measurements of both variables (age and hospital) were found homogenous and had statistically equal central tendency according to sex. Although, some age groups and hospitals had statistically indifferent means, they were found to have heterogeneous distributions; we determined separate reference limits at each subset (except sex) with their own 2.5 th and 97.5 th percentiles.
Between heterogeneous hospital and age categories, we found significant differences of main effects in FT4 (p < 0.001) and TSH mean values (p < 0.001) using the ANOVA method, as shown in Figure 1A,B, respectively.

Indirect reference intervals of FT4 and TSH
In this study, we achieved pediatric FT4 and TSH reference intervals in serum samples for the sake of accurate diagnosis of CH of the newborn. Mild and non-specific clinical findings in newborns, later development of endocrine and metabolic disorders in unrecognized CH, false negative results in screening, variation in reference ranges according to the method used in the laboratory, and the lack of standardization in this field led us to analyse reference limits of FT4 and TSH. Specimens from umbilical cord were used to measure FT4 and TSH for CH screening [8,9]; the 2.5 th and 97.5 th cord blood FT4 were reported as 11.48 pmol/L and 19.74 pmol/L [8], 13.80 pmol/L and 26.06 pmol/L [9], and the 97.5 th cord blood TSH was reported as 27.56 mU/L [8] as shown in  Table 2. Nevertheless, an abrupt increase in pituitary TSH secretion occurs soon after birth and may reach a peak of about 60-80 mU/L at the first 25-30 min of life [23]. Infants receiving heparin or triglyceride emulsions -or even drugs with potential protein displacement activity -and possible structural changes in thyroidbinding globulin, which decrease the affinity of thyroxine (T4) binding, could have caused an increase in serum free thyroxin levels in the first 24 h [24]. We designed our study group starting from the fourth day of life to avoid these high (within-subject) biological variations in the first days of life (4-7 days). As the pediatric endocrinologist (co-author) took initiative, we worked on our exclusion criteria and had age groups classified according to approximate developmental periods of hypothalamushypophysis axis in the childhood. In our preliminary study (presented as a poster in IFCC WorldLab 2014, Istanbul, Turkey), a Dunnett T3 test revealed no significant difference in FT4 between the age groups of 24-30 days and 2-6 months (p = 1.000). With respect to our previous experience, in this study we decided to determine the values for the age groups of 24-61 days in our population. On the other hand, some studies have demonstrated that FT4 and TSH levels continue to decline from birth to the end of adolescence; this decline was more rapid in neonatal period [4,19]. In this study, serum FT4 and TSH levels were statistically different regarding age groups. Therefore, pediatric FT4 and TSH test results should be interpreted via narrowed age groups as shown in our study; especially in the first 3 weeks of neonatal period, accurate cut-off values should be determined for each week. We found the 97.5 th reference limits for TSH in the age groups of 4-7 days  (Table 2).
Especially in the neonatal period, we had the greatest number of patient so far, hospitals using the same model of instruments (Hospitals A and B) showed very close values of TSH with the exception of the age group of 4-7 days (Table 2, Figure 1B). Indirect reference intervals of pediatric thyroid function tests were performed in six other studies [4,11,14,18,19,21], as shown in Table 2. Zurakowski et al. [21] found TSH upper limits similar to ours. For the Immulite 2000, it was reported that the median concentrations of TT4, FT4 and TSH to be up to 3.2-fold higher during the first 2 weeks [12]. By examining the scatter plot of the Canadian Laboratory Initiative in Pediatric Reference Intervals (CALIPER) data and hospital based data with FT4 on the x-axis and TSH on the y-axis, one can interpret (statistically) the manufacturer's overly broad FT4 reference intervals [14,16]. Hübner et al. [10] calculated the reference range by performing the piecewise linear regression analysis, using specified square root functions.  Soldin et al. [25] reported sex specific reference ranges were reported without testing for statistical significance, and Djemli et al. [19] found significant sex differences in FT4 only within the age group of 15-17 months. No sex differences were reported for FT4 and TSH between agematched samples in other studies [4,[8][9][10]12]. In this study, we used no categorization according to sex.

The interaction effect of both the hospital and age groupings on FT4
The interaction effect of both the hospital and age groupings on FT4 was meaningful, Using different model of instruments (or number of instruments of the same model), there were possible analytical variations in immunoassay methodology among the hospitals. Some technical specificities of the instruments can be reached through the Supplementary data (exclusion and imprecision).
In the FT4 measurements, a competitive immunoassay (one step) using direct chemiluminescent technology, during which a labeled structural analog (labeled antigen considered as a tracer) competes with serum T4 for antibody binding sites, is performed in both types of instrumentation [26]. It has been reported that measurements of FT4 based on immunoassay did not give data clinically consistent with the TSH measurements made on the same platform; such problems were avoided when the FT4 was measured by ultra-filtration tandem mass spectrometry [15]. However, the immunoassay procedures meet requirements for clinical use by giving the results in under than an hour [4].
Having the same type of instrumentation Hospitals A and B had similar results of TSH, but there was discrepancy in FT4 values. Defining the analytical variations between the systems (including instrument, water system, reagents, calibration process, after sales service, quality control procedure and quality control program scores) and a better method of choice (e.g. tandem mass spectrometric measurements) should give us advantages in studying traceability. Values of immunoassay calibrators are produced special for the instrumentation. The professional experience of the operator may be another factor influencing the test results. Most of the techniques show acceptable performance when the assay is performed on euthyroid patient samples [27]. Attempts toward standardization and harmonization of thyroid function tests are going on [28,29]. Moreover, evaluation of the analytical bias of the methods or showing the traceability was not within the scope of this study.
In our study, duration of experience (time intervals of data retrieved) also differed in three hospitals; 4 years in Hospital A, 2 years in Hospital B and 3 years in Hospital C. Immunoassays are particularly prone to lot-to-lot differences since antibodies and detection reagents used on these systems tend to be more sensitive to matrix differences [30,31]. Moreover, the data retrieved from Hospital B (using only two instruments of the same model) was greater than that retrieved from Hospital A (using three instruments of the same model), as presented in Table 2.
When we scrutinize the analyte FT4, the formation of thyroxine (T4) illustrates a great diversity in the thyroid gland. It requires rare element iodine for bioactivity (iodine oxidation); it is synthesized as part of a very large precursor molecule (thyroglobulin); it is stored in an intracellular reservoir (colloid) for several weeks; degradation in lysosome occurs to supply T4 and triiodothyronine (T3). The T4 circulates in the plasma mainly (99.98%) in the form linked to proteins: thyroxine binding globulin (TBG, 75-80% of the bound fraction), transthyretin (thyroxine-binding prealbumin, TBPA, 15-20%) and albumin (5-10%) [32,33]. It has been shown that lipoproteins contribute to the transport of thyroid hormones but in lower proportions: high density lipoprotein (3%), very low density lipoprotein (0.03%) and low density lipoprotein (0.2%) [26]. After transportation, there is peripheral conversion of T4 to T3 in target tissues. In this sense, T4 can be thought of as a prohormone [32]. Serum T4 and T3 levels are markedly decreased without a compensatory rise in the serum TSH level during severe illness and nutritional deficiency, leading to a significant decline in basal metabolic rate along with a decrease in protein and fat catabolism [34]. So, FT4 has a more complex structure than TSH (glycoprotein).
Within the scope of age effect on FT4 ( Figure 1A), measurements were kept unchanged (horizontal line) in the age group of 24 days to 6 months measurements in all the hospitals. Then, measurements came closer and presented a plateau following a decrease. When we examined the hospital effect on FT4, higher measurements were noticed in Hospital B, in general. Similar measurements were achieved in Hospital A and Hospital C, but measurements in the age group of 24 days to 6 months seemed higher in Hospital A.
Within the scope of age effect on TSH ( Figure 1B), similar measurements were seen in Hospitals A and B (the same kind of instrumentation) after 16 days of age. And after 3 years of age, a slight increase similar to the other was also seen in both Hospitals A and B. Furthermore, at least one age group differed significantly than others after neonatal period, while some sequential groups did not differ statistically.
On behalf of the pre-analytical variations, the serum concentration of FT4 seems stable during diurnal rhythm, menstrual cycles and seasons, but varies during pregnancy. It has a plasma half-life of about 7 days. The direct measurement of serum FT4 (without prior separation) requires no affect in the pre-existing fragile balance between the free and bound forms of thyroxine. This equilibrium is generally broken when there is a significant variation in the concentration of the carrier proteins (genetic/illness/ drug effects) or an alteration in their binding affinity [26,35]. On the other hand, TSH is secreted in a pattern that is both pulsatile and circadian, and it has a plasma half-life of about 65 min [36,37].
In a study conducted in three ethnic groups (whites, East Asians and South Asians), both sexes and three sampling times (morning, afternoon and evening), significant differences were observed in the concentrations of FT4, total triiodothyronine and total thyroxine (TT4) across ethnic groups [23]. In another study, TSH increases were commonly observed in obese children [38]; anthropometric measurements (body mass index and/or waist circumference) might be taken into consideration in decision making. Sampling strategy, sample size, analytical factors such as instrumentation, gender, age and other demographic and lifestyle factors and the statistical methods (outlier detection, partitioning, reporting standard error and confidence intervals) used to construct reference intervals are all factors affecting the resulting intervals. However, these are often overlooked as factors that might have a considerable effect on the validity of the reference intervals [39].
In studies included adults, within-subject and between-subject biological variations in serum TSH levels (19.3% vs. 24.6%) were shown higher than FT4 levels (5.7% vs. 12.1%, respectively) [40]. Nearly three times lower within-subject biological variations of FT4 (5.7%) in comparison with TSH (19.3%) may cause estimation of lower reference change values (RCV) in FT4 measurements of each instrument. Having lower RCVs, FT4 may have greater clinical significance effect than that of TSH, so analytical variations in FT4 of each instrument should be monitored separately.
In our study, individuals in the early days of life in Hospital B showed lower TSH and higher FT4 levels ( Figure 1A), possibly as a result of frequency of prematurity (or multiple births). It is known that prematurity complicates screening for primary hypothyroidism due to the resulting a developmental delay [8,24,41]. TSH levels are shifted toward lower values in premature babies, lowering of the TSH cut-off point from 20 to 10 mU/L and resulting in a significant increase in the number of recalled newborns with a birth weight less than 2500 g (37.5%). With regard to transient CH (usually iodine-, drug-or maternal antibodyinduced), the overall percentage was 11.2%, and it was significantly higher in premature infants than in full-term infants (25.8 and 7.5%, respectively) [42]. In another study, 26 patients (9.1%) were diagnosed with thyroid dysfunction whereas routine newborn screening identified only three patients, on day of life 30 in premature infants born at <30 weeks gestation. CH with delayed TSH elevation was diagnosed in 20 patients (6.9%) and was significantly associated with multiple gestation, lower birth weight, higher gestational age and lower 5 min APGAR score [43]. Due to the increased risk of primary and secondary hypothyroidism in preterm and low birth weight babies, the determination of TSH and FT4 (between days 3 and 5) is recommended, irrespective of the screening test [44].

Limitations
The primary study limitation is the fact that ethnicity, anthropometric measurements, prematurity according to gestation age of less than 37 weeks (or low birth weight and very low birth weight neonates), multiple births (particularly in cases of same-sex twins), evidence of infection, thyroid auto antibodies and a family history of autoimmune thyroid disease were not taken into consideration. Children diagnosed with known metabolic abnormalities -such as diabetes mellitus, CH, protein-energy malnutrition, and constipation -were not excluded in accordance with the International Classification of Diseases codes.
This study was designed in three different hospitals, in the same region, in order to exclude external effects of ethnic variation, environment, income/occupation/life status of the family, etc. This was strong side of our study. Accepting other hospitals' databases far from this region, different results may be obtained in connection with the possible environmental and ethnic variations, as potential preanalytical factors. Moreover, the hospitals mentioned in this study have been using different instrumentation due to the tender process, since the study was performed. So, evaluation together with some other hospitals (using the same type of instrumentation) was not possible. This side of our study may be considered as another limitation.

Conclusion
Laboratory staff should gain proper and sufficient information about the specificities of the analytes, the principles of the instrumentation and methodology in detail.
Preanalytical and analytical variations related with complex molecular structure of FT4, the equilibrium between T4 and carrier proteins, and other interferences in competitive immunoassay methodology should be taken into consideration to ensure the validity of the result. The interaction effects of both the hospital and age groupings on TSH were no meaningful. We believe that this finding supports the consensus of measuring TSH in guidelines for CH [35] or in annual health assessments when the accurate cut-off values were determined in children with narrowed age groups, for the purpose of careful neurodevelopmental and neurosensory evaluations started early in life and taking into account disease severity at diagnosis and providing appropriate interventions as required.
In this study, FT4 and TSH reference limits differed statistically regarding narrowed age groups; accurate cut-off values were determined for each week, especially in the first 3 weeks of neonatal period. And these reference limits presented similar profiles among all three hospitals. Our reference limits may be recommended in pediatric follow-ups using the Centaur XP or the Immulite 2000, considering the conditions of prematurity, birth-weight, multiple births or infection.