Current Applications of Artificial Intelligence in the Neonatal Intensive Care Unit

: Artificial intelligence (AI) refers to computer algorithms that replicate the cognitive function of humans. Machine learning is widely applicable using structured and unstructured data, while deep learning is derived from the neural networks of the human brain that process and interpret information. During the last decades, AI has been introduced in several aspects of healthcare. In this review, we aim to present the current application of AI in the neonatal intensive care unit. AI-based models have been applied to neurocritical care, including automated seizure detection algorithms and electroencephalogram-based hypoxic-ischemic encephalopathy severity grading systems. Moreover, AI models evaluating magnetic resonance imaging contributed to the progress of the evaluation of the neonatal developing brain and the understanding of how prenatal events affect both structural and functional network topologies. Furthermore, AI algorithms have been applied to predict the development of bronchopulmonary dysplasia and assess the extubation readiness of preterm neonates. Automated models have been also used for the detection of retinopathy of prematurity and the need for treatment. Among others, AI algorithms have been utilized for the detection of sepsis, the need for patent ductus arteriosus treatment, the evaluation of jaundice, and the detection of gastrointestinal morbidities. Finally, AI prediction models have been constructed for the evaluation of the neurodevelopmental outcome and the overall mortality of neonates. Although the application of AI in neonatology is encouraging, further research in AI models is warranted in the future including retraining clinical trials, validating the outcomes, and addressing serious ethics issues.


Introduction
AI (artificial intelligence) refers to computer algorithms that replicate the cognitive function of humans, using specified operational models produced from the statistical assessments of big data sets [1].During the last decades, AI has been applied in several aspects of human life, including the healthcare industry [2].This accomplishment has been made possible by updated hardware technologies and ever-more-complex computer algorithms for processing and storing massive datasets [3][4][5][6].Across healthcare fields, however, there seems to be varying degrees of enthusiasm for the topic of AI research.Nearly half of the evidence arises from published studies in the adult medical sciences (pathology, oncology, neurology, cardiology, gastroenterology, dermatology, pulmonology, endocrinology, emergency medicine), followed by imaging sciences (cell imaging, radiology), and by studies in surgery, ophthalmology, psychiatry, and pediatrics (Figure 1) [3,6,7].[6,7].
Reviews of AI application to neonatal monitoring are scarce and the aim of this review is to cover this very important issue.Moreover, as several novel practices, as AI, are first applied to adult or pediatric populations, this review also aims to motivate neonatologists to seek further information and co-operation with other scientists to explore the perspectives of AI in this very crucial period of life.In this narrative review, we evaluate AI's current applications and advantages in the neonatal intensive care unit (NICU) and explore the perspectives of AI on neonatal care in the future.We, therefore, examined the existing evidence of AI-based monitoring and diagnostic tools that could support the care and follow-up of neonatologists.We explore several AI designs for image, signal, and electronic health record processing, evaluate the benefits and drawbacks of recently  [6,7].
Reviews of AI application to neonatal monitoring are scarce and the aim of this review is to cover this very important issue.Moreover, as several novel practices, as AI, are first applied to adult or pediatric populations, this review also aims to motivate neonatologists to seek further information and co-operation with other scientists to explore the perspectives of AI in this very crucial period of life.In this narrative review, we evaluate AI's current applications and advantages in the neonatal intensive care unit (NICU) and explore the perspectives of AI on neonatal care in the future.We, therefore, examined the existing evidence of AI-based monitoring and diagnostic tools that could support the care and follow-up of neonatologists.We explore several AI designs for image, signal, and electronic health record processing, evaluate the benefits and drawbacks of recently developed decision support systems, and shed light on potential future applications for physicians and neonatologists in their routine diagnostic work.
Our study is organized into (1) presenting the basic AI models applied in neonatal care, (2) grouping AI applications that pertain to neonatology into domains, elucidating their sub-domains, and highlighting the key components of the relevant AI models, (3) reviewing and providing a thorough summary of the latest research with a focus on applying AL to all areas of neonatology, and (4) examining and discussing the existing challenges related to AI in neonatology, as well as directions for future study (Figure 2).developed decision support systems, and shed light on potential future applications for physicians and neonatologists in their routine diagnostic work.
Our study is organized into (1) presenting the basic AI models applied in neonatal care, (2) grouping AI applications that pertain to neonatology into domains, elucidating their sub-domains, and highlighting the key components of the relevant AI models, (3) reviewing and providing a thorough summary of the latest research with a focus on applying AL to all areas of neonatology, and (4) examining and discussing the existing challenges related to AI in neonatology, as well as directions for future study (Figure 2).

Basic Models of Artificial Intelligence
The AI framework is based on machine learning (ML) and deep learning (DL), two subsets of AI that have been widely applied to the healthcare industry [47].To create models based on datasets that enable the algorithm to generate predictions and make judgments without programming, ML refers to the automatic improvement of AI algorithms through experience and vast amounts of historical data.ML uses both unstructured data that are difficult to arrange using predetermined structures (e.g., clinical notes), as well as structured data that are easily organized into predefined structures.Furthermore, ML models generate software algorithms to develop AI decision-support systems [47].The majority of these systems are created using standard algorithms, which consistently produce the same outcome for a given input, and thus, decision-support systems help healthcare professionals analyze enormous amounts of information [48][49][50].
Unlike this broader definition of ML, the fundamental idea behind DL is derived from the neural networks of the human brain that process and interpret information.To simulate this process, DL is based on representation learning and artificial neural networks (ANNs), and when the number of layers is large (i.e., deep) simulates more intricate links between input and output [51,52].The ANN is a mathematical model that mimics the composition and operation of biological neural networks.The quantity and configuration of an ANN's neural layers as well as the training set determine its performance [53].The main subtypes of DL networks are convolutional neural networks (CNNs), recurrent

Basic Models of Artificial Intelligence
The AI framework is based on machine learning (ML) and deep learning (DL), two subsets of AI that have been widely applied to the healthcare industry [47].To create models based on datasets that enable the algorithm to generate predictions and make judgments without programming, ML refers to the automatic improvement of AI algorithms through experience and vast amounts of historical data.ML uses both unstructured data that are difficult to arrange using predetermined structures (e.g., clinical notes), as well as structured data that are easily organized into predefined structures.Furthermore, ML models generate software algorithms to develop AI decision-support systems [47].The majority of these systems are created using standard algorithms, which consistently produce the same outcome for a given input, and thus, decision-support systems help healthcare professionals analyze enormous amounts of information [48][49][50].
Unlike this broader definition of ML, the fundamental idea behind DL is derived from the neural networks of the human brain that process and interpret information.To simulate this process, DL is based on representation learning and artificial neural networks (ANNs), and when the number of layers is large (i.e., deep) simulates more intricate links between input and output [51,52].The ANN is a mathematical model that mimics the composition and operation of biological neural networks.The quantity and configuration of an ANN's neural layers as well as the training set determine its performance [53].The main subtypes of DL networks are convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial neural networks (GANs) [54].CNNs are mostly utilized in computer vision and signal processing applications.The CNN architecture consists of a series of stages, or layers, that make it easier to obtain hierarchical characteristics.Later phases extract more global characteristics, while initial phases extract more local features, like corners, edges, and lines.As features spread from one layer to another, the representation of those features is richer [55].In medicine, CNNs are most commonly employed for image processing and detection, especially in radiology, pathology, and dermatology [55].RNNs are more effective when handling time-series data, such as clinical data or electronic health records (EHRs), and sequential data, such as text and speech [56].GANs are a subtype of the DL model that can be used to create new data that is similar to existing data [57].Finally, natural language processing (NLP) is an AI technology that aids computers in comprehending and interpreting human language, organizing clinical notes and unstructured data, and thus, enabling better decision-making [58,59].Figure 3 presents the basic models of AI.
BioMedInformatics 2024, 4, FOR PEER REVIEW neural networks (RNNs), and generative adversarial neural networks (GANs) [54].CNN are mostly utilized in computer vision and signal processing applications.The CNN a chitecture consists of a series of stages, or layers, that make it easier to obtain hierarchic characteristics.Later phases extract more global characteristics, while initial phases ex tract more local features, like corners, edges, and lines.As features spread from one laye to another, the representation of those features is richer [55].In medicine, CNNs are mo commonly employed for image processing and detection, especially in radiology, patho ogy, and dermatology [55].RNNs are more effective when handling time-series data, suc as clinical data or electronic health records (EHRs), and sequential data, such as text an speech [56].GANs are a subtype of the DL model that can be used to create new data tha is similar to existing data [57].Finally, natural language processing (NLP) is an AI tech nology that aids computers in comprehending and interpreting human language, organ izing clinical notes and unstructured data, and thus, enabling better decision-makin [58,59].Figure 3 presents the basic models of AI.

Neuromonitoring
The previous decades have seen increased research on the neuromonitoring of crit cally ill neonates, thanks to the advancements in AI (Table 1).AI, and especially ML, ha made it possible for computer systems to examine and analyze massive amounts of dat including medical patterns, mainly applied to the electroencephalogram (EEG) and mag netic resonance imaging (MRI) [60].

Electroencephalography
Seizures are the most common neurological emergency in the neonatal population and most likely occur during the first days of life [61].Seizures are more common in neo nates born at less than 30 and more than 36 weeks of gestation, with the frequency o seizures in neonates estimated to be around 8% [61].Additionally, evidence suggests tha treating seizures early on enhances the patient's response to medication [62], while it well known that recurrent seizures are linked to worse long-term neurodevelopment outcomes, regardless of the underlying cause [63,64].Seizures are particularly difficult t diagnose in the neonatal population because they can be difficult to distinguish from no mal infant movements even when they do occur, or they can be limited to electrograph episodes [65].Although neonatal seizures need to be treated right away, it can be ex tremely challenging to recognize, since up to 85% of neonatal seizures may not have an clear clinical symptoms.

Neuromonitoring
The previous decades have seen increased research on the neuromonitoring of critically ill neonates, thanks to the advancements in AI (Table 1).AI, and especially ML, has made it possible for computer systems to examine and analyze massive amounts of data, including medical patterns, mainly applied to the electroencephalogram (EEG) and magnetic resonance imaging (MRI) [60].

Electroencephalography
Seizures are the most common neurological emergency in the neonatal population, and most likely occur during the first days of life [61].Seizures are more common in neonates born at less than 30 and more than 36 weeks of gestation, with the frequency of seizures in neonates estimated to be around 8% [61].Additionally, evidence suggests that treating seizures early on enhances the patient's response to medication [62], while it is well known that recurrent seizures are linked to worse long-term neurodevelopmental outcomes, regardless of the underlying cause [63,64].Seizures are particularly difficult to diagnose in the neonatal population because they can be difficult to distinguish from normal infant movements even when they do occur, or they can be limited to electrographic episodes [65].Although neonatal seizures need to be treated right away, it can be extremely challenging to recognize, since up to 85% of neonatal seizures may not have any clear clinical symptoms.
In the NICU, EEG has emerged as a crucial component of neurocritical care, as it is crucial to identify neonatal seizures and allows the distinction between epileptic seizures and nonepileptic episodes [66].Additionally, EEG monitoring helps uncouple clinical and EEG seizures after antiseizure treatment [67], detecting the electrical discharge that may persist after therapy, while the clinical manifestation of the seizure that may have existed before treatment disappeared.EEG records non-invasively the electrical activity of the cerebral cortex allowing for the real-time evaluation of cortical background function; however, real-time review and implementation of EEG can be challenging.Moreover, continuous EEG (cEEG) increases the diagnostic and prognostic potential, since it allows the evaluation of the background activity over time [68].Thus, cEEG monitoring is the recommended standard of care for identifying and treating all seizures quickly [69,70].Due to the challenges in acquiring traditional EEG, NICUs have currently adopted a less precise but more straightforward method of EEG monitoring, the amplitude-integrated EEG (aEEG).As opposed to cEEG monitoring, aEEG is a bedside device that shows one or two channels of filtered, smoothed, and quantitatively converted EEG data, while the cortical electrical activity is compressed in duration and converted in a semi-logarithmic chart [71,72].However, aEEG does not have an ideal sensitivity, specificity, and interobserver agreement for identifying seizures [73], and thus, it is recommended to serve as an adjunct to cEEG monitoring [68,74].
During the past few decades, research in AI, and particularly DL, has evolved in the field of the creation of automatic seizure detection algorithms [75].These algorithms exhibit remarkable seizure detection accuracy, comparable to that of human specialists [76].In 1992, Liu et al. proposed a computerized detection system for neonatal seizures, and thereafter, numerous methods have been documented, refined, and verified [8].The performance of the initial automatic seizure detection algorithms was suboptimal for therapeutic use as they had been created by modifying algorithms intended for adult users [8,9]; however, to date many seizure detection algorithms have been developed mainly for full-term but also preterm neonates [77].The development of these algorithms requires the labeling of seizures by several specialists as well as obtaining enough data for testing, training, and validation.
In 2020, a randomized clinical trial assessing the effect of ML on the real-time identification of neonatal seizures in a NICU was published [78].According to that report, more seizures were recognized in real-time, when AI algorithms were applied in the NICU [78].Following extensive training and offline analysis, the accuracy of the recognition of electrographic seizures both with and without the automatic seizure detection algorithms was tested in a multicenter clinical trial, suggesting that the algorithm could serve as a bedside tool in clinical practice [79,80].The model greatly enhanced the recognition of seizure hours, even though the set aim of improving the detection of specific neonates with seizures was not fulfilled.
In addition to monitoring and treating newborn seizures, EEG is also a valuable diagnostic tool for neonatal encephalopathy, namely hypoxic-ischemic encephalopathy (HIE).AI research is being conducted to create algorithms, many of which use DL techniques, that can evaluate brain maturation, estimate sleep stages [81], and grade background EEG patterns in HIE [82].Automated EEG interpretation based on ML technology has recently shown good performance in detecting HIE severity and can be helpful in the early severity grading of neonatal HIE [83,84].Such an example of advanced signal processing included the convolutional neural network structures, which can self-extract convolutional features from raw EEGs [82].Besides, the possible application of AI in predictive modeling for electrographic seizures in newborns with HIE was examined by Pavel et al., with the goal of early detection of infants most at risk of recurrent seizures [85].ML algorithms were created for clinical and both qualitative and quantitative EEG characteristics.Notably, both the automated quantitative EEG analysis and the analysis carried out by a skilled neurophysiologist (qualitative) increased the predictive value of these models by incorporating clinical data.These studies highlight the possibility of using ML in evaluating the EEG background of neonates with HIE.

Magnetic Resonance Imaging
The application of AI to enhance the utility and inference from brain MRI has advanced significantly during the last few years.Technical advancements in AI techniques include methods to reduce movement artifact effects and boost information yield, as well as advancements in tissue classification [86].These have made it possible for a deeper evaluation of the developing brain, and a new understanding of the effects of prenatal events on structural and functional network topologies [87].
One of the regions in the neonatal brain where myelination starts is the posterior limb of the internal capsule (PLIC).Crucially, both term and preterm newborns' neurological outcomes depend on the proper and timely maturation of the PLIC.Abnormalities in the PLIC detected on MRI have been linked to hemiplegia, and worse neurodevelopmental outcomes [88].Over the past few decades, there has been a noticeable rise in the prevalence of cerebral palsy to over 2.0 per 1000 live births, which is inversely proportional to the gestational age and carries significant lifetime burdens [89,90].An ML algorithm for the automated segmentation and quantification of the PLIC in preterm newborns undergoing MRI was proposed in a recent work [91], where authors demonstrated good accuracy for the ML model when compared to expert analysis, indicating the successful application of their algorithm to a large dataset.Although promising, it is necessary to evaluate how well this approach will work in clinical settings.
Identifying neuroanatomic phenotypes and predicting the outcome are the major areas in the clinical domain where AI is facilitating innovation.Preterms are characterized by a specific phenotype including abnormal brain development, cerebral palsy, autism spectrum disorder, attention deficit hyperactivity disorder, psychiatric illness, and issues with language, behavior, and socioemotional functions [92].Abnormalities of structural and functional networks are frequent in preterm neonates as they have been obtained from structural, diffusion, and functional MRI [93].Models that combine data from two or more imaging modalities into a single framework, can reveal previously unknown patterns of neuroanatomic variants in preterm neonates that are related to cognitive and motor outcomes [94].Diffusion tensor metrics, neurite orientation dispersion, regional volumes, and density imaging measurements are among the several forms of MRI data that are integrated into a single model to compute morphometric similarity networks [95].This kind of research helps identify the neural roots of cognition and behavior, identify the networks that most contribute to atypical brain development, and examine the drivers of brain dysmaturation and resilience.
Current research also aims to compare traditional computer vision approaches with efficient networks that generate reliable and accurate segmentation.To evaluate methods for segmenting newborn tissue, T1W, and T2W pictures were provided with manually segmented structures; segmenting myelinated from unmyelinated white matter is, nevertheless, still challenging [96].The limited number of high-quality labeled data must also be acknowledged as a key limitation when comparing earlier attempts on newborn brain segmentation [97].

Neurodevelopmental Outcome
ML techniques have been widely used for the neurodevelopmental evaluation and follow-up of preterm neonates (Table 2).Numerous studies used ML techniques to examine brain connections [40,[98][99][100], brain structure analysis, and brain segmentation in preterm neonates [45,101].Evidence suggests an association between lower brain volume, cortical folding, axonal integrity, and microstructural connectivity with preterm birth [41,102].Additional effects of prematurity on the developing connectome have been found in studies examining functional markers of brain maturation [40,103].
Neurocognitive assessments are among the most significant domains of neurodevelopment outcomes at two years of age.Previous studies assessed how the brain's morphological alterations relate to neurocognitive outcomes [39,43,44] and the prediction of brain age [104].It has been demonstrated that multivariate models combining near-term structural MRI findings and white matter microstructure on diffusion tensor imaging may help identify preterm neonates at risk for language impairment and guide early intervention [43,44].Moreover, to predict neurodevelopmental impairment at two years of age, a self-training deep neural network model has been suggested, using MRI data obtained in very preterm neonates at term-equivalent age [31].Besides, according to a study that used ML techniques to assess the impact of PPAR gene activity on brain development, a significant correlation was found between aberrant brain connectivity and PPAR gene signaling's role in aberrant white matter development [105].
ML models have been used to evaluate the association of the developmental outcome regarding language skills with the near-term MRI findings in previous studies.By examining MRI characteristics and perinatal clinical data, Valavani et al. employed ML to predict language skills at two years of corrected age in preterm neonates [42].Language delay could be accurately predicted by delayed myelination patterns and specific clinical characteristics.The authors concluded that ML models could be useful for healthcare services and enhance the long-term outcomes of preterm neonates.Furthermore, in a recent study, Balta et al. proposed an AI-based automated monitoring of newborns' general motions, a crucial screening test for detecting neuromotor problems in children [33].The authors created an automated model to analyze infants' overall motions, by processing videos taken with a simple camera at home.Certain patterns of spontaneous movements, such as the absence of fidgety movements or the presence of predominately contracted coordinated movements, were particularly indicative in predicting cerebral palsy in infants between the ages of 3 and 5 months of age [33].The results demonstrated that gross motor metrics may be meaningfully estimated and potentially used for early identification of movement disorders ML, machine learning; SNP, single-nucleotide polymorphism; AUROC, area under the receiver operating characteristic curve; MRI, magnetic resonance imaging.

Respiratory System
One of the main causes of infant mortality and morbidity in preterm deliveries is BPD.Although several biomarkers have been associated with the emergence of RDS, there are currently no meaningful prenatal diagnostic tests for BPD [16].In a previous study, Ahmed et al. evaluated an ML technique also suitable for the analysis of other biological materials and created a helpful bedside point-of-care test approach for neonatal RDS [10].According to the authors' findings, following clinical validation, the use of ML-guided devices that can measure RDS biomarkers in real time may be used to direct therapies for preterm infants exhibiting respiratory symptoms.Moreover, Raimondi et al. concentrated on AI-assisted analysis of lung ultrasonography and its capacity to correlate with respiratory status in critically ill neonates with RDS [11].The authors constructed a dataset of scans for texturing and a correlation between the oxygenation status, the ultrasound findings, and the mean grayscale intensity was established by an ML model.They enrolled a cohort of neonates of different origins and varying degrees of respiratory distress, and they demonstrated a significant correlation between blood gas indices and the grayscale ML analysis [11]; however, the relatively small sample size, the heterogeneous etiology of the respiratory distress, and the variable postnatal age suggested that further research on this topic with larger datasets is warranted.
Regarding BPD, Dai et al. investigated the combination of genetic and clinical factors, where exome sequencing was carried out for preterm neonates and integrated with clinical aspects [12].The authors demonstrated that by using ML for the genomic analysis they could predict the development of BPD with an accuracy of 90% [12].Also, the combination of gastric aspirate after birth and clinical information analysis could predict BPD development with a sensitivity of 88% [16].Besides, Leigh et al., in a retrospective analysis of the perinatal and the respiratory factors in a sample of preterm neonates, created an ML algorithm that, after testing and training, could predict BPD-free survival well in terms of accuracy [14].An AI approach has been proposed using DL and image segmentation, that can predict the severity of BPD by analyzing the segmentation of the lungs in chest X-rays taken on the 28th day of oxygen delivery [17].The benefits of the aforementioned algorithm included non-invasiveness, speed, and independence from the experience of neonatologists, whereas demonstrated strong prediction performance.
Moreover, research on BPD with ML predictive models has shown that long-term invasive ventilation is one of the most significant risk factors for BPD and longer hospital stays.ML models using long-term invasive ventilation data could predict extubation failure with significant accuracy [106][107][108].The risk stratification for BPD is a specific area of interest, aiming to identify infants who may benefit from preventive measures like corticosteroids or treatment for specific morbidities such as PDA.The BPD Outcome Estimator is a predictive tool approved by the US National Institute of Child Health and Human Development useful in directing steroid treatment and family counseling [13].The estimator was initially limited to White, Black, or Hispanic neonates, however, Patel et al. recently created a a web application based on an ML system for extremely preterm neonates of Asian descent [15].Nonetheless, the study's conclusions were limited because the method was tested on a small dataset, requiring further comprehensive and prospective validation before being used in clinical practice.
Apnea of prematurity, another common morbidity in preterm neonates, is either obstructive (caused by airway obstruction), central (caused by cessation of respiratory drive), or mixed (a combination of both).Bedside monitors are programmed to sound an alarm when detect a decreased respiratory effort due to a decrease in thoracic motion [109].A substantial number of false positive episodes have been observed in clinical tests indicating that this approach can identify central apneas with suboptimal accuracy [110].Varisco et al. created an ML-based improved apnea detection model to automatically identify real apnea using data from the electrocardiographic monitoring of neonates [111].The authors concluded that the AI algorithm resulted in better detection of apneas compared to traditional approaches with fewer false alarms, and they also showed that breathing patterns were altered more often in neonates with more frequent central apneas [111].Although AI may drastically alter routine clinical practice, given that alarm fatigue is a growing problem in NICUs putting neonates in danger of missing alarms, the lack of external validation, along with the small sample size represents serious flaws in the suggested methodology.Table 3 presents examples of the application of AI in neonatal respiratory diseases.

Ophthalmology
ML models have been also applied in ROP, which is a severe complication of prematurity and a major cause of childhood blindness in high-and middle-income countries (Table 4).ROP affects mainly extremely preterm (less than 28 weeks), very preterm (28-32 weeks), or very low-birthweight (1500 g) neonates [23].Telemedicine and AI are being considered as potential diagnostic tools for ROP, given the dearth of ophthalmologists who can treat neonates with ROP.Gaussian mixture models are among several ML techniques, to diagnose and categorize ROP from retinal fundus pictures [22,23].In a previous study, the i-ROP system was shown to have a 95% accuracy in classifying pre-plus and plus illness.This performance was significantly better than the performance achieved by nonexperts (81%) and comparable to that achieved by experts (92% to 96%) [22].Furthermore, a DL automated score model was generated in a recent multicenter trial, to identify one of the features of the affected retina [28].This study showed how a DL comprehensive screening platform may enhance screening accessibility and objective ROP diagnosis.In another large-scale multicenter trial, a different group of scientists created a DL method for predicting ROP and its severity [30].Retinal images from the initial ROP screening and neonatal clinical risk variables were obtained to develop an AI predictive algorithm.When compared to the traditional ROP score, the DL-based system demonstrated comparable accuracy, while it was found more effective in identifying and interpreting abnormal signs than the classical ophthalmoscopy.Computer-based image analysis system (i-ROP) Retina image When compared to the reference standard, the i-ROP system classified preplus and plus illness with 95% accuracy.This was comparable to the performance of the 3 individual experts (96%, 94%, 92%), and significantly higher than the mean performance of 31 nonexperts (81%)

Retina image
The diagnosis of plus disease (as opposed to pre-plus disease or normal) had an average AUROC of 0.98, whereas the diagnosis of normal (as opposed to pre-plus disease or normal) was 0.94.The method achieved 93% sensitivity and 94% specificity for + illness detection.The sensitivity and specificity for identifying pre-plus illness or worse were 100% and 94%, respectively Taylor et al. [29] An algorithm assessing plus illness and its usefulness for impartially tracking the advancement of ROP

Retina image
The AUROC for detection of treatment-requiring retinopathy of prematurity was 0.98, with 100% sensitivity and 78% specificity ROP, retinopathy of prematurity; DL, deep learning; AUROC, area under the receiver operating characteristic curve; OC-Net, occurrence network; SE-Net, severity network; CNN, convolutional neural network.
Moreover, in previous studies, telemedicine has been compared with Binocular Indirect Ophthalmoscope, demonstrating that both techniques are equally sensitive in detecting zone disease, plus disease, and ROP, although Binocular Indirect Ophthalmoscope was more accurate in recognizing zone III and stage 3 ROP [24,27].Besides, using DL algorithms, the accuracy of ROP examination was 94% for normal diagnosis and 98% for illness and diagnosis, outperforming ROP experts [25].Finally, in previous studies, DL algorithms were constructed to estimate the clinical progression of the ROP by assigning vascular severity scores [29] or to detect disease requiring therapy with an accuracy of 98% [26].Overall, introducing AI into ROP screening programs might improve access to care for secondary ROP prevention [26]; however, despite the encouraging results, more extensive external validation using additional multicenter datasets is necessary.Additionally, the development of more advanced ML algorithms may be able to provide more significant prognostic information regarding the accurate staging, zone, and disease.

Gastrointestinal System
Recently, an AI algorithm was created based on a large dataset about the clinical characteristics of neonates who developed intestinal perforation [112] (Table 5).The suggested algorithm evaluated various clinical data, including vital signs, radiologic findings, biomarkers, and laboratory results, and led to a more accurate and early prediction of intestinal perforation of preterm neonates compared to all other traditional ML methods [112].Furthermore, regarding nutrition, a previous study in England demonstrated that ML techniques can be used to evaluate nutritional practices that were found to be associated with body weight on discharge and the development of BPD [113].Finally, Han et al. recently examined the potential application of AI to predict postnatal growth failure.Using a large dataset of very low birth weight neonates from several NICUs, ML models were created using a variety of methodologies, showing a strong predictive performance [114].Nevertheless, the study's findings were limited since it lacked crucial information about enteral and parenteral feeding.

Sepsis
Early and late-onset neonatal sepsis is a major cause of infant mortality and morbidity [115].Diagnosing neonatal sepsis and starting antibiotics is challenging in clinical practice, which emphasizes the need for a comprehensive approach.Previous studies have explored the role of heart rate variability in predicting early-onset sepsis with an accuracy of 64-94% [20].Also, regarding the detection of late-onset sepsis, ML decision algorithms have utilized clinical and laboratory biomarkers obtaining an optimal accuracy and a mean precision rate of 0.82 3 h before the onset of sepsis [21] (Table 5).

Patent Ductus Arteriosus
The ductus arteriosus which is patent during the intrauterine life may have significant hemodynamic consequences in preterm neonates and is associated with higher rates of morbidity and mortality.Therefore, it should be assessed whether closing the PDA could increase survival chances relative to the risk of side effects [116].ML techniques have been developed for the detection of PDA from electronic health records [19] and auscultation records [18] (Table 5).This resulted in an accuracy of 76% for the prediction of PDA in very low birth weight infants based on the analysis of 47 perinatal factors using 5 different ML techniques [19] and 74% for the analysis of 250 auscultation records [18].

Dermatology
Infantile hemangiomas (IH) may present at birth and usually grow quickly between the ages of one and three months, so it's critical to diagnose the condition at an early age to avoid complications [117].In a recent work by Zhang et al., a CNN was used to identify IH using clinical photos, reporting a diagnostic accuracy rate of 91.7%, which was even higher when restricting the analysis to the facial region [118].This study showed that AI algorithms may be used for non-standardized photos, indicating their relevance to the real-world clinical context [118].Future research on IH diagnosis will need to develop algorithms that can distinguish between different diseases instead of using a binary classifier, in addition to the capacity to categorize IH risk.
Although there is limited research on AI's application for pediatric dermatology issues, studies have examined adult illnesses that frequently affect pediatric patients.Atopic dermatitis is a recurrent condition that usually starts early in life [119].A CNN was recently created by Guimaraes et al. to examine multiphoton tomography data for atopic dermatitis, with a diagnostic accuracy of 97% [120].Furthermore, for the diagnosis of atopic dermatitis, De Guzman et al. created a multi-model, multi-level approach that produced a higher average confidence level (68.37% vs. 63.01%,respectively) than a single-model method [121].Gustafson et al. used a phenotypic method based on ML to identify patients with atopic dermatitis in 2017.The system achieved a high positive prediction value and sensitivity by combining code information with the electronic health record collection.These findings show how ML and natural language processing can be used for EHR-based phenotyping [122].The majority of current research uses adult database photos, where patient age is not clearly distinguished.This may cause biases in algorithms that are used for purposes other than clearly stating the ages for whom they are intended.Besides, a method based on deep neural networks was used by Han et al. to classify extremely rare skin lesions and distinguish between eczema and other infectious skin disorders.The authors also demonstrated that distinguishing between inflammatory and infectious causes could help with treatment options [123].Moreover, a support-vector-machine-based image processing technique was developed for hand eczema segmentation and reported better results compared to other sophisticated approaches that were also tested [124] (Table 6).Table 6.Examples of the current evidence of artificial intelligence application in neonatal dermatology.
3.9.Miscellaneous 3.9.1.Vital Signs Monitoring In previous studies, ML analysis has been developed to analyze physiologic data that are electronically captured as signal data to identify artifact patterns [125], predict neonatal morbidity [126], or identify late-onset sepsis [21].An ML algorithm using electronically recorded vital signs within the first three hours of life, including heart rate and respiration rate of preterm neonates with a birth weight ≤2000 g and gestational age ≤34 weeks predicted overall morbidity with an accuracy of 91% [126].Furthermore, Lyra et al. developed DL-based techniques that could result in a reliable, real-time assessment of crucial indicators, such as changes in body temperature [127].Although the analysis proved difficult for several factors during the recording, the authors demonstrated the viability of using inexpensive, embedded graphics processing units to monitor neonates' temperatures in real-time, although more research is warranted to broaden the application of this technique in clinical settings [127] (Table 7).

Neonatal Jaundice
The application of ML and DL models was explored in a previous study investigating the potential of using a dataset made up of photos taken using a smartphone camera for the identification of neonatal jaundice in term and late preterm neonates.The authors used data from pictures of the skin and eyes to train a neural network to identify jaundice [128].Furthermore, Guardalia et al. used an ML approach to analyze clinical data for a large neonatal population to develop a risk assessment tool for neonatal jaundice that did not rely on bilirubin readings, that performed well in the risk categorization of newborn jaundice [129] (Table 7).

Mortality
Even with the recent advances in neonatal care, preterm neonates are still very vulnerable to death because of their immature organ systems [130].ML models have been developed for the prediction of neonatal mortality by exploring causative factors [32,38] (Table 8).A recent review including term and preterm neonates between the gestational ages of 22 and 40 weeks reported that neural networks, random forests, and logistic regression were common models developed by the investigators [131].Among the included studies, only two studies finished external validation, five studies published calibration plots, five studies reported sensitivity and specificity of their models that ranged from 63 to 80% and 78 to 98% respectively, and eight reported accuracy that ranged from 58.3 to 97.0% [131].Despite having 17 features, the best model overall was linear regression analysis [131].Recent studies exploring the application of AI models in severely low birthweight and preterm neonatal populations reported an accuracy of 68.9-93.3%[34,35].Among the several limitations of these studies was the lack of inclusion of vital parameters to depict dynamic changes, while gestational age, birth weight, and Apgar scores were the most significant variables in the models [36,37].These limitations suggest that further implementation, calibration, and external validation of AI healthcare applications is warranted in future studies.

Challenges, Limitations, and Future Perspectives of Artificial Intelligence in Neonatology
AI has been currently established as a useful component in several parts of neonatal care, to help physicians to provide improved, more effective, and safer care (Table 9).However, specific issues need to be addressed before the wide application of AI models.At first, healthcare providers need to improve their digital literacy, so that they can comprehend the fundamental principles and limitations of AI.That would help healthcare providers evaluate recently created AI tools and focus on their appropriate and safe application in clinical settings.Also, to develop and implement AI tools, cross-disciplinary, worldwide collaborations involving data scientists, computer scientists, healthcare providers, attorneys, and legislators are required.Additional drawbacks of AI include the lack of larger datasets to train the models, the heterogeneity of the data, generalizability problems, the lack of evidence-based guidelines for some diseases affecting neonates, and the cost.Applying AI to newborn care also involves addressing critical challenges such as the model's interpretability, the necessity of external validation to improve generalizability, and the necessity of appropriate evaluation of performance (Table 2).
Finally, there are serious ethical issues to be considered.Important decisions in neonatology are often accompanied by a complex and difficult ethical component, and multidisciplinary methods are necessary for advancement [132].Informed consent, bias, safety, privacy of the patients, and allocation are among the ethical issues with AI applications in healthcare [133].The use of AI in neonatology has become more challenging due to the necessary transparency, viability limitations, life-sustaining therapies, and various international restrictions [134].To date, there hasn't been any reporting on how an ethics framework would be applied in neonatology yet.

Challenges of AI Areas of Improvement
Quality of the dataset AI tools require high-quality data to be trained.Studies should address limitation including small sample sizes, improper management of missing information, and heterogeneity evaluation in various demographic subsets

Model performance evaluation
Model performance should be continually evaluated on the entire dataset.Apart from the AUROC, additional performance metrics, such as the precision-recall curve, specificity/sensitivity, and calibration metrics should be assessed

Clinical impact and external validation
External validation is crucial because, as in different dataset or in clinical practice, the tool's performance may degrade due to an over-modeling of the training data.Also, the effectiveness of AI should be evaluated in terms of calibration and discrimination quality as well as patient outcomes and the clinical workflow Comprehending Bed-side models should enhance intelligence, interpretability, and transparency Guidelines for critical evaluation, regulation, and oversight methodological, critical appraisal, medicolegal problems, and necessary monitoring is required to guarantee the model's safe and effective usage Ethics Informed consent, bias, patient privacy, and allocation are among the ethical issues with health AI, and negotiating their solutions can be challenging.Important decisions in neonatology are often accompanied by a complex and difficult ethical component, and multidisciplinary methods are necessary for advancement AI, artificial intelligence; AUROC, area under the receiver operating characteristic curve.

Conclusions
AI is becoming more and more important in healthcare services following our contemporary culture that moves toward automated decision support systems.The main advantage of using AI in healthcare is its ability to evaluate large volumes of medical data from multidisciplinary studies.This type of data is too complex for medical professionals to study quickly enough to find the diagnosis and determine a treatment plan.When trained with the right data, AI models function like human neurons and can quickly and accurately solve problems.Finding the appropriate treatment strategy requires accuracy and time, especially in intensive care units.When integrating AI models into NICU clinical practices including treatment and transport, trust is a crucial component.AI-based solutions can be used in NICUs mainly to confirm the current treatment plans rather than implement their recommendations.The current evidence regarding the application of AI in neonatology is encouraging, however, further research is warranted including retraining clinical trials and validating the outcomes to make AI algorithms more useful in the future.

Figure 1 .
Figure 1.Studies on artificial intelligence by medical specialty.Based on evidence from references [6,7].

Figure 1 .
Figure 1.Studies on artificial intelligence by medical specialty.Based on evidence from references [6,7].

Figure 2 .
Figure 2. Overview of the study organization.

Figure 2 .
Figure 2. Overview of the study organization.

Figure 3 .
Figure 3. Basic models of artificial intelligence.

Figure 3 .
Figure 3. Basic models of artificial intelligence.

Table 1 .
Examples of the current evidence of artificial intelligence application in neuromonitoring in neonatology.

Table 2 .
Examples of the current evidence of artificial intelligence application in neonatal neurodevelopmental outcome.

Table 3 .
Examples of the current evidence of artificial intelligence application in neonatal respiratory diseases.

Table 4 .
Examples of the current evidence of artificial intelligence application in neonatal ophthalmology.

Table 5 .
Examples of the current evidence of artificial intelligence application in neonatal gastrointestinal diseases, sepsis, and patent ductus arteriosus.

Table 7 .
Examples of the current evidence of artificial intelligence application in neonatal miscellaneous domains.
DL, deep learning; ML, machine learning; AUROC, area under the receiver operating characteristic curve.

Table 8 .
Examples of the current evidence of artificial intelligence application in neonatal mortality.The model consisted of three variables: birth weight, Apgar score at 5 min of age, and gestational age.This model had an AUROC of 76.9%, while birth weight and gestational age had an AUROC of 73.1% and 71.3% ML, machine learning; ANN, artificial neural networks; AUROC, area under the receiver operating characteristic curve; RF, random forest; SVM, support vector machine.

Table 9 .
Challenges of artificial intelligence in neonatology.