Web-Based Newborn Screening System for Metabolic Diseases: Machine Learning Versus Clinicians

Background A hospital information system (HIS) that integrates screening data and interpretation of the data is routinely requested by hospitals and parents. However, the accuracy of disease classification may be low because of the disease characteristics and the analytes used for classification. Objective The objective of this study is to describe a system that enhanced the neonatal screening system of the Newborn Screening Center at the National Taiwan University Hospital. The system was designed and deployed according to a service-oriented architecture (SOA) framework under the Web services .NET environment. The system consists of sample collection, testing, diagnosis, evaluation, treatment, and follow-up services among collaborating hospitals. To improve the accuracy of newborn screening, machine learning and optimal feature selection mechanisms were investigated for screening newborns for inborn errors of metabolism. Methods The framework of the Newborn Screening Hospital Information System (NSHIS) used the embedded Health Level Seven (HL7) standards for data exchanges among heterogeneous platforms integrated by Web services in the C# language. In this study, machine learning classification was used to predict phenylketonuria (PKU), hypermethioninemia, and 3-methylcrotonyl-CoA-carboxylase (3-MCC) deficiency. The classification methods used 347,312 newborn dried blood samples collected at the Center between 2006 and 2011. Of these, 220 newborns had values over the diagnostic cutoffs (positive cases) and 1557 had values that were over the screening cutoffs but did not meet the diagnostic cutoffs (suspected cases). The original 35 analytes and the manifested features were ranked based on F score, then combinations of the top 20 ranked features were selected as input features to support vector machine (SVM) classifiers to obtain optimal feature sets. These feature sets were tested using 5-fold cross-validation and optimal models were generated. The datasets collected in year 2011 were used as predicting cases. Results The feature selection strategies were implemented and the optimal markers for PKU, hypermethioninemia, and 3-MCC deficiency were obtained. The results of the machine learning approach were compared with the cutoff scheme. The number of the false positive cases were reduced from 21 to 2 for PKU, from 30 to 10 for hypermethioninemia, and 209 to 46 for 3-MCC deficiency. Conclusions This SOA Web service–based newborn screening system can accelerate screening procedures effectively and efficiently. An SVM learning methodology for PKU, hypermethioninemia, and 3-MCC deficiency metabolic diseases classification, including optimal feature selection strategies, is presented. By adopting the results of this study, the number of suspected cases could be reduced dramatically.


INTRODUCTION
Newborn screening programs for severe metabolic disorders, that hinder an infant's normal physical or mental development, are well established [1][2][3]. The metabolic diseases can be treated by effective therapies at the early stages. For most patients, if not being diagnosed and treated in time, the acute encephalopathy crises can occur during infancy or early childhood.
New and refined screening methodologies based on the modern tandem mass spectrometry (MS/MS) of metabolites have been developed for routine deployment [4][5][6][7]. It allows simultaneous analyses of multi-compounds in a highthroughput process. The functional endpoint of the metabolic cycles, offering a precise snapshot of the current metabolic state, can be detected, analyzed in a single, small blood sample. The sample is collected during the first few days of life. Over the last two decades, this technology has been revolutionary and has resulted in remarkable expansion to detect metabolic disorders [7][8]. It is open to population-wide testing for a large number of disorders [9].
The National Taiwan University Hospital (NTUH) initiated the newborn screening research in 1981. It has carried on the nation's newborn screening tasks for metabolic diseases since July, 1985. At present, NTUH newborn screening laboratory provides medical service to approximately one third of the nation's newborns for inborn errors of metabolism. Moreover, in Taiwan, the screening rate of newborn's inherited metabolic diseases has improved from 6.4% in 1984 to reach over 99.9% in the past years. Currently, the screening program executed in Taiwan, all newborns are tested for inherited metabolic disorders [10]. Each year the NTUH Newborn Screening Program tests over 70,000 babies and about 30-40 babies need urgent assessments and treatments. Phenylketonuria (PKU) detection is universally included in the newborn screening items over majority countries applying tandem mass spectrometry. The analyte elevations of Phenylalanine (PHE) and the ratio of PHE to Tyrosine (TYR) determine the PKU. In general, a newborn has higher values of both analyte elevations; the baby has stronger tendency of having the PKU disease. If the baby is not screened during the routine newborn screening examination, the disease may occur clinically with seizures, albinism (excessively fair hair and skin), and a "musty odor" of the baby's perspiration and urine. In most cases, a follow-up examination should be done at approximately two weeks of age to verify the initial result and uncover any PKU that was initially missed. Untreated children are normal at birth, but fail to attain early developmental milestones, develop microcephaly, and demonstrate progressive impairment of cerebral function. Hyperactivity, EEG abnormalities and seizures, and severe learning disabilities are major clinical problems later in the children's lives.
Machine learning techniques offer an obvious and promising approach to examine high dimensional data. Thus, the goal of the paper is to utilize machine learning techniques and mining knowledge to establish the classification models for metabolic disorders screenings and diagnoses. The models can Figure 1. Newborn screening data collection workflow generate high discrimination and improve prediction accuracies. The support machine machines (SVM) is a robust technique for data classification. An important concept for SVM is to transform the input space into a higher dimensional feature space to realize the linearity of the classifier. SVM will determine an optimal hyperplane for separating binary classes by inputting a training data set. Although excellent results have been achieved based on this classifier for solving these problems in several areas, very few published papers use SVM as the classifier for the newborn screening data.
The remainder of the article is organized as follows. The related work on global newborn screening and cut-off scheme are presented in Section 2. In Section 3, it elaborates data and methodology of the study. Detailed descriptions of the approaches including data collection, digitizing process as well as data training and prediction are illustrated in the section. In Section 4, experimental results are summarized. Finally, the article concludes in Section 5.

A. Newborn Screening Program
The newborn screening program represents one of the major advances in child health of the past century. The program is a system that must function within geographic, economic, and political constraints, and which must smoothly integrate sample collections, laboratory analyses, follow-up diagnoses and treatments [4,11].
In the United States (US), the newborn screening program has been carried out in all fifty states since the 1970s [4,[11][12][13]. However, the programs are state-run, and decisions are left to the individual states regarding the conditions to be screened for, the mechanism for confirmatory testing, and financing of the programs [13][14][15]. For instance, a total of 31 states required testing for more than 20 of disorders as of June 1, 2006 [16]. The Georgia Newborn Screening Program has educational and monitoring mechanisms in place to remain watchful for any signs or symptoms of disorders in their patients [17]. The March of Dimes recommends that all newborns be screened for 29 disorders including hearing loss. Furthermore, it is estimated that one baby in 10,000 to 25,000 is born with PKU in the US; newborn screening for PKU is required in all 50 states and US territories [1]. In conclusion, each state in the US requires screening tests, but the specific tests performed vary among the states.
The newborn screening program in Asia Pacific, most countries are approaching full coverage, although the number of conditions screened varies widely. However, in the developing countries, Nepal and India are particularly interested in developing and inquiring Congenital Hypothyroidism (CHT) programs. CHT is the universal important condition; usually it is the first condition considered for screening [1,18].
In Europe, the number of disorders screened for by MS/MS ranges from two disorders (PKU and MCADD (medium-chain acyl-CoA dehydrogenase deficiency)) in some countries to 20 in others [3]. Thus, the disorders chosen to be included in European newborn screening programs differ considerably. For example, PKU and MCADD are examined in Switzerland.

B. Cut-off Scheme
The cut-offs scheme has been a popular screening decision [19][20][21]. However, manipulating of cut-offs is only one of several strategies to improve specificity of screening without sacrificing sensitivity. Two-tier testing and use of multiple markers improves sensitivity and specificity [22], [23]. For examples, steroid profiling using MS/MS or high performance liquid chromatography (HPLC) following elevated 17hydroxyprogesterone (17-OHP) in screening for congenital adrenal hyperplasia (CAH) has been shown to be effective. However, the use of the second-tier testing must be done in conjunction with clinical observation of the cases [24]. A threetiered cut-offs scheme was used during the initial 24 months of CAH screening in Bavaria [25]. In general, while setting cutoffs, a balance must be struck between time, money, anxiety caused by false positives, and an acceptable number of missed cases [17].

A. Data Collection
In Fig. 1, it illustrates the newborn screening data collection workflow process of NTUH. The process can be categorized into several stages; the functionalities of each stage are included in the diagram.

1) Specimen Collection
Initially, the phlebotomy hospitals collect specimens, and fill out infants' demographic data (i.e., his/her mother's name, date of birth, weight, etc.) in the phlebotomy filter paper. Afterwards, the hospitals deliver the newborns' phlebotomy filter papers to the National Newborn Screening Center at NTUH via the post mail service daily. As soon as the center receives the specimens, laboratory serial numbers are assigned to each baby's specimen. Meanwhile, the newborns' information is entered into the NTUH database manually.
2) Sample Screening Secondly, technicians inject the specimens into the experimental apparatus, i.e., MS/MS. After analyzing the blood compound, the system generates the raw data of the analyte concentration. The data are examined and validated via statistical methods applied to establish the upper limits for each analyte (also referred to as the "cutoffs"). If the outcome is negative, the case is directly stored into the database and issued a report; parents and doctors can review the status online. This completes the newborn screening procedures.

3) Test Result Recalled
Based on the primary test results, the suspected positive cases are interrogated and verified by laboratorians. The cases return to the original phlebotomy hospitals. The phlebotomy hospitals are responsible for tracking and obtaining a new sample as Recalled specimens. The specimens will be delivered to the center for additional examinations. If the results turn out to be positive or suspected again, the case will be forwarded to the doctors for further diagnoses. The parents of the newborns and the associated hospital can track the results, a report issued, online within three days.

4) Diagnosis Confirmation
For positive cases, a referral hospital is recommended and is in charge of follow-up the cases. The hospital provides preliminary precaution advices and comprehensive confirmation diagnoses for the cases. The confirmation reports will be issued at the National Newborn Screening Center, Public Health Bureau, and the original phlebotomy hospitals via e-mail. The case will be registered at the center as well. Moreover, the patients will receive further dietary therapies or treatments. Thus, it completes the whole newborn screening procedures.

B. Digitization Process
In NTUH, between May 2002 and July 2007, the neonatal screening data are archived both on papers and in CD disks as Microsoft Excel Files (.xls). During the period, technicians manually enter serial numbers of blood spot specimens of 96well microtiter plate in a report. Before data analysis, these data should be digitized in advance. The digitization process is shown in Fig. 2.
Initially, technicians manually enter serial numbers of blood spot specimens of 96-well microtiter plate on a report. Each serial number identifies an individual baby's specimen. The report consists of 12 wells per row, 8 rows per report, and 35 species per specimen. After MS/MS analyzing, the system generates a single spreadsheet displaying the concentrations of blood compound in a tabulated data format: 1 row per page, total 8 pages per plate as indicated in Fig. 2. The status report of the MS/MS results, i.e., serial numbers, report dates, and the result outcomes, is stored into the NTUH database.

C. Data Training and Prediction
In the study, a proper supervised classification data flow is proposed to enhance the accuracy and sensitivity of newborn screening process, as depicted in Fig. 3. In the diagram, the training data undergoes learning to produce the SVM prediction model; the testing data processes the same methods to obtain the prediction result according to the trained model. Before training or predicting, the dataset is preprocessed by the MS/MS machine and digitization procedure. The feature selection generates the most relevant features. The scaling method is applied to avoid biasing and to improve computing efficiency. (1) is proposed to evaluate the importance , , [26] of a specific feature , 1 for a hyperplane between corresponding class a and class b, where , denotes the mean value of the feature k for all training samples in class a and , is the variance value of the feature k for all the training samples in class a.

    
This equation aims to evaluate the differentiation capability between the two classes as well as the stability in the same class for a given feature k. For each hyperplane, the importance of each feature is needed to be calculated first, and then they are being sorted by descending order.

2) Scaling
The scale class contains y_lower, y_upper, y_min and y_max to store scale range to lower and upper bound and minimum, maximum value of label y, respectively. It scales the range from [min, max] to [lower, upper]. The v is the original value; v' is the value after scaling.
The array feature_max and feature_min store the maximum and minimum values of the concentration values of each metabolism. The default scale is the concentration value of each metabolism to [lower, upper] ([-1, +1]). The scale class constructor svm_scaling() receives dataset from SVM training and SVM predicting and returns the scaled dataset back.

3) Support Vector Machines
The central idea of SVM classification is to use a linear separating hyperplane to create a classifier. The vector which can affect the separation is called a "support vector".
Given a training set of instance-label pairs: The hyperplane can be expressed as below: Then the definition of a decision function is: with the largest possible margin, which apart from being an intuitive idea has been shown to provide theoretical guarantees in terms of generalization ability. If the data are distributed in a highly nonlinear way, employing only a linear function causes many training instances to be on the wrong side of the hyperplane, which results in under-fitting occurs; the decision function does not perform well. Thus, SVM non-linearly transforms the original input into a higher dimensional feature space. More precisely, the training data x is mapped into a (possibly infinite) vector Q , , … , , … . In this higher dimensional space, it is possible that data can be linearly separated. Therefore, it tries to find a linear separating plane in a higher dimensional space.
The parameters are called the slack variables, and they ensure that the problem has a solution in case the data are not linearly separable. The constraints in (5) contain a penalty term, ∑ , where C > 0, and the parameter is chosen by the user to assign a penalty to errors. Usually this problem is called a primal problem. After the data are mapped into a higher dimensional space, the number of variables (w, b) becomes very large or even infinite. This difficult is handled by solving the dual problem.
A kernel for a nonlinear SVM projects the samples to a feature space of higher dimension via a nonlinear mapping function. Among the nonlinear kernels, the radial-based function (RBF) is defined as where r is a kernel parameter.

4) SVM Prediction
The final prediction equation for a hyperplane between two classes is expressed by (6), where z is an unknown input vector, x i are support vectors which m and n denote the number of support vector for the corresponding two classes of the hyperplane, and the kernel function K(x i , z) is used by RBF in the study.

A. Data sets
The experimental dataset was gathered from the Newborn Screening Center of National Taiwan University Hospital between 2006 and 2010. The blood samples, taken within a few days after the newborns' birth, have been analyzed by MS/MS in a high throughput process. The measured metabolic datasets (35 measured metabolites including amino acids and acylcarnitines) have been archived in the NTUH database. In

B. Feature Selection Procedure
Before SVM training, Fisher Score [26] is applied for evaluating the importance of the features. The importance scores between the positive and negative cases are illustrated in Table I. PHE is the most significant analyte ranked by Fisher Score and also in original NTUH newborn screening cut-off scheme.
After calculating the importance of the features, several examinations are applied to obtain the best combination feature set in order to raise the accuracy and sensitivity of the training model. Finally, the best combination is LEU, PHE, TYR, C5, C14OH, and C18:1OH.

C. Results
During the experiment, the Phenylketonuria (PKU) metabolic disorder is specified preliminarily. There are 275,205 newborn samples collected from Newborn Screening Center of National Taiwan Table II. In Table II, the first row presents the cases that adapt the traditional cut-off scheme in hospital; the second row is the SVM classification without feature selection; the last row means the results of classifying applying SVM with feature selection. The sensitivity column indicates that the SVM with feature selection method receives the same sensitivity with cutoff scheme, i.e., 100%. In the specificity column displays that the SVM approach with feature selection achieves 99.98% compared with the classical cut-off technique as listed. Similarly, in the table, the SVM without feature selection performs the highest accuracy. However, the approach obtained the sensitivity 82.35%. It implies the method cannot detect positive cases precisely. Obviously, the SVM approach with higher dimensional selected features demonstrates the best discrimination power among the three methodologies. In addition, it increases the accuracy accordingly. The detailed numbers of TN, TP, FN, and FP are demonstrated in Table III. In the table, the numbers of the FN cases under the cut-off scheme and SVM with feature selection method are 0. It implies the both approaches can predict positive cases accurately. However, the SVM without feature selection method detects 14 positive cases and 3 false negative cases. In other words, the method cannot precisely predict the positive cases. Consequently, it is not acceptable for clinicians.
Furthermore, in the table, the number of FP cases using the cut-off scheme is 100; the same number of the SVM with feature selection method is 35. It indicates that there are 65 more cases have been diagnosed as positive under the cut-off scheme. During the screening process, these cases will be performed follow-up. Subsequently, it generates additional medical resources consumption.
The purpose of the study is to sort out apparently healthy individuals who do not have PKU from those who probably do have. However, screening programs are, by nature, imperfect. In setting cutoffs, a balance must be struck between time, money, anxiety caused by false positives, and an acceptable number of missed cases. On one hand, laboratory advances in tandem mass spectrometry make it possible to screen newborns for many rate inborn errors of metabolism. This raises many policy issues including screening's cost-effectiveness, ethics, quality, and oversight. On the other hand, new techniques in genetics surveillance have facilitated an improved public health approach to the detection of, and interventions offered for, a range of important genetic conditions. Over the past two decades, scientific advances associated with genetic have been increasing at an explosive rate. It means that an increasing number of diagnostic, predictive and carrier tests are available, for instance, leaning towards data mining technologies.

V. CONCLUSION
In the study, a newborn screening approach is proposed. The approach can predict whether the newborn has metabolic disorder diseases based on Support Vector Machines (SVM) classifier with Fisher Score ranked feature selection. Applying the approach, the predicting accuracy of PKU can be improved from 99.96% (cut-off value approach) to over 99.98%, the number of the false positive cases can be reduced from 100 to 35. Therefore, by adapting the approach, the number of suspected cases is reduced substantially; it also handles the medical resources effectively and efficiently.