Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age

Fetal age and weight estimation plays an important role in pregnant treatments. There are many estimate formulas created by combination of statistics and obstetrics. However, such formulas give optimal estimation if and only if they are applied into specified community. We proposed a so-called Phoebe framework that supports scientists to find out most accurate formulas with regard to the community where scientists do their research. Now we compose this paper that focuses on using Phoebe framework to derive optimal formulas from experimental results. In other words, this paper is an evaluation of Phoebe framework. Citation: Nguyen L, Ho THT (2017) Experimental Results of Phoebe Framework: Optimal Formulas for Estimating Fetus Weight and Age. J Comm Pub Health Nursing 3: 163. doi:10.4172/2471-9846.1000163


Introduction
Fetal age and weight estimation is to predict the birth weight or birth age before delivery. It is very important for doctors to diagnose abnormal or diseased cases so that she/he can decide treatments on such cases [1]. This research is relevant to apply regression model into the birth estimation. Fetal ultrasound measures such as bi-parietal diameter (bpd), head circumference (hc), abdominal circumference (ac), fetal length (fl), arm volume (arm_vol), and thigh volume (thigh_vol) are recorded and considered as input sample for regression analysis which results in a regression function. This function is formula for estimating fetal age and weight according to these ultrasound measures. Note, some terminologies such as regression function, function, regression model, estimate function, estimate model and estimate formula have the same meaning.
There are many estimate formulas resulted from gestational researches [2][3][4][5][6][7][8][9][10]. Some of them gain high accuracy but they are only appropriate to population, community or ethnic group where such researches are done. If we apply these formulas into other community such as Vietnam, they are no longer accurate. Moreover, it is difficult to find out a new and effective estimate formula or the cost of time and resources of formula discovery is expensive. Therefore, Nguyen and Ho [1] proposed a so-called Phoebe framework for supporting physicians and researchers to discover optimal estimate formulas. This research focuses on using Phoebe framework to derive such optimal formulas from experimental results. Note that Phoebe framework used statistic software package "Java Scientific Library" of Michael Thomas Flanagan [11] and parsing package "A Java expression parser" of Jos de Jong [12]. The package "Java Scientific Library" is the most important in the framework. The framework is implemented by Java language [13].

Materials and Methods
As aforementioned in the introduction section, we make experiments based on Phoebe framework in order to find out optimal formulas for estimating fetus weight and ages with note that such formulas are most appropriate to our gestational sample. We use two samples in which the first sample includes 2-dimension (2D) ultrasound measures of 1027 cases and the second sample includes 3-dimension (3D) ultrasound measures of 506 cases. Ho and Phan [14] collected these samples of pregnant women at Vinh Long General Hospital -Vietnam with obeying strictly all medical ethical criteria. These women and their husbands are Vietnamese. Their periods are regular and their last periods are determined. Each of them has only one alive fetus. Fetal age is from 28 weeks to 42 weeks. Delivery time is not over 48 h since ultrasound scan. Measures in 2D sample are bpd, hc, ac, and fl. Measures in 3D sample are bpd, hc, ac, fl, thigh_vol, arm_vol. The unit of bpd, hc, ac, fl is millimeter. The unit of thigh_vol and arm_vol is cm 3 . The units of fetal age and weight are week and gram, respectively.
As aforementioned, Phoebe framework uses regression model for birth estimation. Suppose a linear regression function Y = α 0 + α 1 X 1 + α 2 X 2 + … + α n X n where Y is response or dependent variable and X i (s) are regression or independent variables. Such regression variables are also called regressors. Each α i is called regression coefficient. Response variable Y represents fetal weight or age. The built-in algorithm, called seed germination (SG) algorithm, is the core of Phoebe framework. SG algorithm is responsible for discover optimal regression model fastest, which is a heuristic algorithm. It is based on two assumptions about an optimal regression function which satisfies the pair of optimal conditions [1, p. 22]: -First assumption: regression variables X i (s) trends to be mutually independent. It means that any pair of X i and X j with i ≠ j in an optimal function are mutually independent. The independence is reduced into the looser condition "the correlation coefficient of any pair of X i and X j is less than a threshold δ". This is minimum assumption.
-Second assumption: each variable X i contributes to quality of optimal function. The contribution rate of a variable X i is defined as the correlation coefficient between such variable and Y-real. The higher the contribution rate is, the more important the respective variable is. Variables with high contribution rate are called contributive variables. So optimal function includes only contributive regression variables. The second assumption is stated that "the correlation coefficient of any regression variable X i and real response value Y-real is greater than a threshold ε". This is maximum assumption.
SG algorithm tries to find out a combination of regression variables X i (s) so that such combination satisfies two above assumptions. In other words, this combination constitutes an optimal regression function that satisfies two following conditions [1, p. 22]: -The correlation coefficient of any pair of X i and X j is less than the minimum threshold δ > 0. This condition is corresponding to the minimum assumption, which is called minimum condition or independence condition.
-The correlation coefficient of any X i and Y-real is greater than the maximum threshold ε > 0. This condition is corresponding to the maximum assumption, which is called maximum condition or contribution condition.
These conditions are called the pair of heuristic conditions. Given a set of possible regression variables VAR = {X 1 , X 2 ,…, X n } being ultrasound measures, let f = α 0 + α 1 X 1 + α 2 X 2 + … + α k X k (k ≤ n) be the estimate function and let Re(f) = {X 1 , X 2 ,…, X k } be its regression variables. Note that the value of f is fetal age or fetal weight. Re(f) is considered as the representation of f. Let OPTIMAL be the output of SG algorithm, which is a set of optimal functions returned. OPTIMAL is initialized as empty set. Let Re(OPTIMAL) be a set of regression variables contained in all optimal functions f ∈ OPTIMAL. SG algorithm has four following steps [1, p. 22]: 1. Let C be the complement set of VAR with regard to OPTIMAL, we have C = VAR \ Re(OPTIMAL) where the backslash "\" denotes complement operator in set theory. It means that C is in VAR but not in Re(OPTIMAL).

2.
Let G ⊂ C be a list of regression variables satisfying the pair of heuristic conditions. These variables are taken from complement set C. If G is empty, the algorithm terminates; otherwise going to step 3.

3.
We iterate over G in order to find out candidate list of good functions. For each regression variable X ∈ G, let L be the union set of optimal regression variables and X. We Suppose CANDIDATE is candidate list of good functions, which is initialized as empty set. Let g be the new function created from L; in other words, regression variables of g belong to L, Re(g) = L. If function g meets the pair of optimal conditions, it is added into CANDIDATE, 4. Let BEST be a set of best functions taken from CANDIDATE.
In other words, these functions belong to CANDIDATE and satisfy the pair of optimal conditions at most, where correlation is largest and the sum of residuals is smallest. If BEST equals OPTIMAL then the algorithm stops; otherwise assigning BEST to OPTIMAL and going back step 1. Note that two sets are equal if their elements are the same.
SG algorithm was described in article "A framework of fetal age and weight estimation" [1, pp. 21-23]. It is easy to recognize that the essence of SG algorithm is to reduce search space by choosing regression variables satisfying heuristic assumption as "seeds". Optimal functions are composed of these seeds. Algorithm always delivers best functions but can lose other good functions. The length of function is defined as the number of its regression variables. The optimal bias is defined as the difference between two functions about correlation and sum of residuals in optimal conditions. Terminated condition is that no more optimal functions can be found out or possible variables are browsed exhaustedly. So the result function is the longest one but some other shorter functions may be optimal with insignificant optimal bias.
The current implementation of SG algorithm establishes that the minimum threshold δ is arbitrary. It also supports non-linear regression models as follows: Polynomial model  Product model The notations "exp" and "log" denote exponent function and natural logarithm function, respectively. Most of non-linear regression models can be transformed into linear regression models. For example, given product model, following is an example of linear transformation. The product model becomes linear model with regard to variables U, Z i and coefficients β i as follows: With the built-in SG algorithm, Phoebe framework can be totally used for any regression application beyond birth estimation.

Experimental Results
Phoebe framework can produce amazing formulas. We compare our optimal formulas with the others according to metrics such as estimate correlation and estimate error range, given two aforementioned samples [14,15] collected at Vinh Long General Hospital -Vietnam. Let Y = {y 1 , y 2 , y n } and Z = {z 1 , z 2 , z n } be fetal sample age/weight and fetal estimated age/weight, respectively. The estimate correlation denoted R is correlation coefficient of sample response value and estimated response value. The correlation R reflects adequacy of a given formula. The larger the R is, the better the formula is.  The estimate error mean denoted µ is mean of errors. The error mean µ reflects accuracy of a given formula. The smaller the absolute value of µ is, the more accurate the formula is. If µ is positive, the respective formula leans to overestimation. If µ is negative, the respective formula leans to low estimation.  The combination of error mean µ and standard deviation σ results out a so-called error range. For example, if µ = -0.0292 and σ = 1.45 then, the error range is -0.0292±1.45, which means that the total average error ranges from -1.4792 = -0.0292-1.45 to 1.4208 = -0.0292+1.45. The error range reflects both adequacy and accuracy of a given formula ( Figure 6). Table 1 shows comparison between our best age formula and the others with 2D sample. As a convention, name of each formula is the name of respective author listed in references section. For example, formula "Ho 1" is the first formula of the author Ho [5]. As seen in Table 1, our formula is the best with R=0.9303 and error range -0.0292 ± 1.4500 week(s). As a convention, our formulas have names with prefix "NH".
The sign "^" denotes exponent operator. The template of formulas aims to flexibility, which can be input of any computational tool. Table  2 shows comparison between our best weight formula and the others with 2-dimension sample. As seen in Table 2, our formula is the best with R=0.9636 and error range -7.4656 ± 212.5573 grams. Table 3 shows comparison between our best age formula and the others with 3-dimension sample. As seen in Table 3, our formula is the best with R=0.9970 and error range ± 0.2696 week. Table 4 shows comparison between our best weight formula and the others with 3-dimension sample. As seen in Table 4, our formula is the best with R=0.9708 and error range ± 180.9803 grams.
Within the context of this research, from section of 3D ultrasound in PhD dissertation of Thu-Hang T. Ho [5], I recognize that fetus  weight and fetus age are mutually dependent. For instance, when fetus age increases, fetus weight increase too. As a result, weight estimation is improved significantly if fetus age was known before. If fetus age is added into the regression model of fetus weight as a regression variable (regressor), the resulted weight estimation formula, called dual formula, is even better than the most optimal ones shown in tables 2 and 4. Such dual formula is not only precise but also practical because many pregnant women knew their gestational age before taking an ultrasound examination. Given 2D sample and 3D sample, table 5 shows dual formulas in comparison with the most optimal ones shown in tables 2 and 4 with regard to R and error range. As a convention, our dual formulas have names with prefix "NHD". Notation "log10" denotes logarithm function with base 10.
In table 5, all dual formulas NHD * are better than normal formulas NH * with regard to R and error range. Moreover, NHD * do not need too much regressors. Given 2D sample, NHD 1 and NHD 2 use 4 and 3 regressors including age regressor, respectively whereas both NH 3 and NH 4 uses 4 regressors. Given 3D sample, NHD 3 and NHD 4 use 6 and 5 regressors including age regressor, respectively whereas NH 7 and NH 8 use 5 and 3 regressors, respectively. Although our formulas are better than all remaining ones with high adequacy (large R) and high accuracy (small error range), other researches are always significant because their formulas are very simple and practical. Moreover, our formulas are not global. If they are applied into other samples collected in other communities, their accuracy may be decreased and they may not be still better than traditional formulas such as Sherpard and Hadlock. However, it is easy to draw from our experimental results that if Phoebe framework is used for the same samples with other researches, it will always produces preeminent formulas. In order to achieve global optimality with Phoebe framework, followings are two essential suggestions: -Experimenting on Phoebe framework with many samples.
-Adding more knowledge of pregnancy study, ultrasound technique, and obstetrics into Phoebe framework. In other words, the additional knowledge will be modeled as constraints of SG algorithm.
These suggestions go beyond this research. For my opinion, we cannot reach absolutely the global optimality because Phoebe framework focuses on local optimality with specific communities.
Essentially, the suggestions only alleviate the weak point of the built-in SG algorithm in global optimality.

Conclusion
According to experimental results, there is no doubt that Phoebe framework produces optimal formulas with high adequacy and accuracy; please see tables 1, 2, 3, and 4 for more details. However we also recognize the weak point of our research is that the built-in SG algorithm can lose some good formulas due to the heuristic conditions. The suggestive solution is to add more constraints into such conditions; please read the article "A framework of fetal age and weight estimation" [1, pp. 24-25] for more details.
It is really difficult to apply our complex formulas for fast mental calculation because we must pay the price for their high accuracy. In the future, we will embed these formulas into software or hardware of medical ultrasound machine so that users are easy to read estimated values resulted from the machine. The research is available at http:// phoebe.locnguyen.net so that doctors and researchers are easy to use. We do not know whether they have enjoyed our product. However, we have presented Phoebe framework at Ho Chi Minh City Society of Reproductive Medicine (HOSREM) on November 26, 2016 and so many doctors knew and concerned it.