We congratulate Balada et al.1 for pursuing important transcriptomic research in neonatal encephalopathy (NE). They examined the messenger RNA (mRNA) expression of 23 candidate genes in whole blood using real-time polymerase chain reaction from 24 babies with NE and 34 control babies (1 healthy; 33 with mild polycythaemia), and also at ages <6 h (n = 15 NE vs. 2 controls), 12 h (n = 17 NE vs. 1 control), 24 h (n = 18 NE vs. 2 controls), 48 h (n = 14 NE vs. 13 controls), 72 h (n = 13 NE vs. 9 controls), and 96 h (n = 4 NE vs. 7 controls). Six genes (MMP9, PPARG, IL8, HSPA1A, TLR8, and CCR5) had a discriminant graphical pattern between babies with NE and control infants, and hence were selected for subsequent statistical modeling.

While this is useful information, we would like to highlight some limitations of these observations and methodology. First, the use of a preselected panel of just 23 genes may lead to erroneous conclusions, particularly in preliminary studies. Often, the results of microarray-based gene expression studies are not reproducible, and when the analysis is limited to a small set of preselected genes, the significance of the observed differences is even more difficult to interpret.2 Second, two controls were recruited within 6 h and only one within 12 h, which limited the contribution of the first two time points to the identification of the differentially expressed genes to be used for the analysis. While we acknowledge that recruitment of control infants represents a significant challenge for neonatal studies, the identification of a gene expression profile within 6 h of birth is pivotal since the therapeutic window for neonatal neuroprotective interventions is narrow, and treatments need to be initiated within hours after birth to have a chance of success. Finally, this is not the first report on use of gene expression in NE, as authors have claimed. We had previously reported differences in gene expression profiles of babies with NE, healthy control, and those with sepsis, using next-generation sequencing approach.3,4

To illustrate this issue, we examined the differential expression of the six genes reported by Montaldo et al.3 in our previously published dataset of 12 encephalopathic babies and 6 time-matched healthy term controls. We used linear mixed-effect models to compare the expression of the six genes (MMP9, IL8, HSPA1A, CCR5, PPARG, TLR8) in the two groups during the first 11 h of age. The linear mixed-effect models were adjusted by gender and mothers’ age and were individually developed for each of the genes. In agreement with Balada et al.,1 our results confirmed a higher expression of matrix metallopeptidase 9 (MMP9) and peroxisome proliferator activated receptor (PPARG) in neonatal encephalopathic patients during the first 11 h of age compared to the control group (+1.07 log (FPKM (fragments per kilobase of transcript per million mapped)), p = 0.0511, 95% confidence interval (CI) [0.16, 1.99]; +1.65 log (FPKM), p = 0.0278, 95% CI [0.37, 2.87]). However, the variations in expression of the remaining four genes were not validated. Figure 1 depicts expression of the six genes over the first 11 h of life in each group.

Fig. 1: Expression of the six selected genes analyzed by LMM in controls and NE patients during the first 11 h of life.
figure 1

The LMMs exhibit the relationship between hours and gene count values expressed as logarithm of fragments per kilobase of transcript per million mapped reads. 95% Confidence intervals are represented on each side of the regression lines by a gray area. Blue lines indicate control and red lines indicate NE infants.

We must emphasize that this analysis does not suggest any inaccuracy in the analysis reported by Ioannidis et al.,2 and merely represents a pitfall of their approach to biomarker discovery. Gene preselection is a controversial issue in biological experiments and may result in precipitous conclusions. Although transcriptomic signatures have great potential for developing personalized neuroprotection, these are still in an early stage of development with only few studies in NE so far.3,5 Hence, we recommend a cautious and rigorous approach before basing biomarker identification on preselected genes, which can lead to bias, and to type 1 error. Moving away from hypothesis testing or setting it as a validation step following genome-wide research may allow us to select an appropriate and reproducible combination of genes involved in NE.6

The importance of choosing the most appropriate approach for biomarker discovery has also been recently highlighted in microRNA (miRNA) studies. Circulating miRNAs, the regulators of gene expression, are often identified as candidate biomarkers. However, small sample sizes and use of low vs. high-throughput sequencing technologies may lead to results that are not subsequently validated.7,8 This issue has been highlighted in a cord blood miRNA study, which assessed miRNA profiles in blood plasma from 38 mothers and their newborns. The authors found low correlation between cord and maternal samples based on the identified miRNA signature. However, the sample size was too small for both the training and validation of these results in a robust predictive model.9

Genome-wide profiling of gene expression levels can be performed without a prior hypothesis, ensuring all genes are considered equally. This would also allow potential discoveries of novel genes involved in disease onset and outcome prediction and would prevent from drawing conclusions solely based on a narrowed-down set of genes. The paradigm of hypothesis-generating research should complement and facilitate discoveries that have not been possible through the exclusive use of hypothesis-driven methodologies.