Plasma cell-free RNA signatures of inflammatory syndromes in children

Significance This study provides a proof of concept that circulating RNA in blood plasma can be used to differentiate multiple inflammatory syndromes in children that are difficult to diagnose clinically, as well as provide evidence that circulating RNA in blood plasma can also be used to characterize organ damage using the same assay.

Inflammatory syndromes, including those caused by infection, are a major cause of hospital admissions among children and are often misdiagnosed because of a lack of advanced molecular diagnostic tools.In this study, we explored the utility of circulating cellfree RNA (cfRNA) in plasma as an analyte for the differential diagnosis and characterization of pediatric inflammatory syndromes.We profiled cfRNA in 370 plasma samples from pediatric patients with a range of inflammatory conditions, including Kawasaki disease (KD), multisystem inflammatory syndrome in children (MIS-C), viral infections, and bacterial infections.We developed machine learning models based on these cfRNA profiles, which effectively differentiated KD from MIS-C-two conditions presenting with overlapping symptoms-with high performance [test area under the curve = 0.98].We further extended this methodology into a multiclass machine learning framework that achieved 80% accuracy in distinguishing among KD, MIS-C, viral, and bacterial infections.We further demonstrated that cfRNA profiles can be used to quantify injury to specific tissues and organs, including the liver, heart, endothelium, nervous system, and the upper respiratory tract.Overall, this study identified cfRNA as a versatile analyte for the differential diagnosis and characterization of a wide range of pediatric inflammatory syndromes.

cell-free RNA | pediatrics | inflammation | diagnostics | machine learning
The differential diagnosis of inflammatory syndromes in children is complex owing to their overlapping clinical manifestations, nonspecific symptoms, and developmental age-related barriers to communication.These challenges often result in delayed or inac curate diagnoses, thereby impeding effective clinical management and increasing the risk of long-term adverse health effects.A key example is Kawasaki disease (KD), an inflam matory syndrome of unknown etiology that primarily affects children under 5 y of age.KD is often misdiagnosed because significant clinical signs overlap between KD and other inflammatory and/or infectious conditions.Accurate diagnosis is critical, as KD patients who do not receive intravenous immunoglobulin (IVIG) early in the course of illness have a substantially increased risk of developing coronary artery aneurysms, making KD the leading cause of acquired heart disease in children.Thus, there is a clear need for accurate molecular tests for inflammatory conditions such as KD to inform timely and appropriate treatment.
Currently, the differential diagnosis of pediatric inflammatory syndromes relies on clinical assessment of signs and symptoms and results from a broad array of laboratory tests.Culture-based and molecular assays are routinely used to identify viral and bacterial pathogens, but these tests do not interrogate the host response and hence are unable to differentiate between infectious and noninfectious conditions.Serologic metabolic and antigen biomarkers used for diagnosis often lack specificity ( 1 , 2 ).To address these limi tations, recent studies have explored the use of whole blood RNA transcriptome profiling of the human host response to assess disease severity and differentiate among inflammatory conditions including KD, multisysteminflammatory syndrome in children (MIS-C), viral, and bacterial infection ( 3 -6 ).However, while the whole blood profile is indicative of the host immune response, it provides little information regarding the extent of inflammationrelated cell injury or death in solid organ tissues.In contrast, cell-free nucleic acids in plasma, including cell-free DNA (cfDNA) and cell-free RNA (cfRNA) are promising analytes for evaluating inflammation as they are released by dead or dying cells originating from both the bloodstream and solid tissues ( 7 ).Many recent studies have explored the use of cell-free nucleic acids for monitoring of pregnancy ( 8 , 9 ), cancer ( 10 , 11 ), OPEN ACCESS transplantation ( 12 , 13 ), and infection ( 5 , 14 -23 ), yet the poten tial of cell-free nucleic acids for the differential diagnosis of inflam matory syndromes remains largely unexplored.
In this study, we applied plasma cfRNA profiling by RNA sequencing (RNA-seq) to compare host immune and cellular injury responses associated with four different inflammatory and/ or infectious syndromes in children.We identified shared signa tures across multiple conditions, highlighting the importance of incorporating multiple comparison groups for the development of disease-specific biomarkers.Using cfRNA profiles, we built a machine learning model with high accuracy in differentiating between MIS-C and KD, two pediatric inflammatory syndromes with overlapping clinical presentations.We then expanded this methodology to construct a multiclass diagnostic classifier capable of distinguishing among KD, MIS-C, viral, and bacterial infec tions.In addition, we demonstrate that the cfRNA profile can be correlated with markers of tissue damage and may be able to dif ferentiate among different viral infections.We propose an appli cation of cfRNA profiling as a decision support tool for differential diagnosis of inflammatory syndromes in the clinical setting.

Results
Clinical Cohort.We collected and analyzed 370 plasma samples from pediatric patients with inflammatory and infectious conditions at four hospitals in the United States, including Rady Children's Hospital San Diego (UCSD), Emory, Children's National Hospital (CNH), and University of California San Francisco (UCSF) (Fig. 1A).This sample set included patients diagnosed with KD, MIS-C, viral infection, bacterial infection, and other hospitalized pediatric controls, some but not all with inflammatory disease (for example, arthralgia, Crohn's disease flare, parenchymal lung disease, chronic lung disease, toxic shock syndrome, and postvaccine myocarditis), as well as healthy children (Fig. 1A and SI Appendix, Table S1).Included cases of bacterial and viral infections were heterogeneous with respect to infection site and pathogen (Dataset S1).Age, sex, race, severity, and inflammatory status were recorded for each patient (Fig. 1 B-E).All patient samples were collected within 4 d of hospitalization during the acute phase of disease (Materials and Methods).RNA next-generation sequencing was used to quantify the genes and cell types of origin of cfRNA in each sample, with an average of 20.7 million reads sequenced per sample (Materials and Methods).No batch effect was observed between samples from different hospitals (Materials and Methods and SI Appendix, Fig. S1 A and B).Machine learning models were built to identify biosignatures associated with different diseases.
cfRNA Signatures of Disease.Our initial focus was on characteriz ing changes in cfRNA profiles in plasma that are common across the various inflammatory conditions.We performed pairwise differential gene abundance analysis between the healthy control group and each disease group (Materials and Methods, Fig. 2A, and Dataset S2).This analysis identified differentially abundant genes (DAGs) for all comparisons [Benjamini-Hochberg (BH) adjusted P-value < 0.05].The smallest number of DAGs (n = 2,686) was identified when comparing healthy controls to patients in the other hospitalized control group, likely due to the smaller sample size and higher heterogeneity of this sample group.The greatest number of DAGs (n = 6,606) was identified when comparing healthy controls to patients with KD (Fig. 2A).Further analysis of the DAGs for all conditions revealed a significant number of shared genes (1,871 DAGs, Fig. 2A).Chief among these were histone protein coding genes that were elevated for all disease groups, indicating that an increase in histone transcripts is a universal indicator of inflammation (SI Appendix, Fig. S2A).The analysis also identified immune related transcripts Myeloperoxidase (MPO), Elastase, Neutrophil Expressed (ELANE), CD53, and C-X-C Motif Chemokine Receptor 2 (CXCR2) elevated in each disease group, homeostasis related transcripts CDK19, ANAPC5, and 32 mitochondrial protein coding RNAs in the control group (SI Appendix, Fig. S2B).Pathway analysis of the 1,871 overlapping DAGs revealed an enrichment of neutrophil and cell replication transcripts related to inflammation (Fig. 2B and Dataset S3).These findings point to shared signatures of inflammation among disease groups and emphasize the need for intergroup comparisons to develop disease-specific biosignatures.
cfRNA Can Differentiate MIS-C and KD.KD and MIS-C share many clinical characteristics: They are highly inflammatory, present with endothelial dysfunction, and have multiple overlapping signs, including skin rashes, mucosal involvement, and fever.There are currently no molecular tests to distinguish between KD and MIS-C.We therefore assessed whether cfRNA could be used to differentiate these severe pediatric inflammatory syndromes.We divided the KD (n = 101) and MIS-C (n = 97) samples into training (60%), validation (20%), and test (20%) sets, ensuring a roughly equal representation of hospital of origin and disease subclassification across all three sets (Fig. 3A).We used the training data for feature selection and to train machine learning models.We then used the    In our initial analysis using the training set, we identified 1,242 DAGs between KD and MIS-C (DESeq2, BH adjusted P -value < 0.01 and base mean > 10; Fig. 3B and Dataset S4 ).We refined this gene list based on adjusted P -value, base mean, individual gene AUC, and fold change, resulting in a final tally of 132 genes for model input ( Fig. 3C ).We then trained 14 machine learning clas sification models, evaluating their performance on the validation set (Materials and Methods ).The generalized linear model with elastic net regularization (GLMNET) model with least absolute shrinkage and selection operator (LASSO) regression exhibited the highest validation set AUC and was selected as our final model ( Fig. 3D ).A unique feature of the GLMNET LASSO algorithm is the feature selection step, which selected 25 genes for the mod eling (SI Appendix, Table S2 ).Using this trained model, we tested the classification performance using the test set [receiver operating characteristic (ROC)-AUC train = 1.00, validation = 0.98, and test = 0.98] ( Fig. 3E ).We also observed similar distributions in the classification scores across sample sets, further confirming that there was little to no overfitting of the training and test sets ( Fig. 3F ).
We compared the KD vs. MIS-C gene panel to previously reported differentially expressed genes (DEGs) from a study of cultured endothelial cells treated with KD and MIS-C sera ( 24 ).We found significant overlap in genes; nine of the 25 genes in our panel were also identified as DEGs in the previously reported study (nominal P -value < 0.05).Of the overlapping genes, those elevated in acute KD are associated with autophagy (ANXA6, CD44, and PDCD4) and endothelial-mesenchymal transition (EndoMT) (STK17B and VIM), processes linked to endothelial cell function in inflammatory conditions ( 25 -29 ).This provides evidence that the key differences in endothelial dysfunction and cardiac out comes between MIS-C and KD may be linked to differences in endothelial autophagy and activation of EndoMT pathways.
Multidisease Classification Using cfRNA.We next asked whether cfRNA could be used in the more challenging scenario of multidisease classification.For this, we developed a machine learning framework that combines the outputs of one-vs.-onemodels using a random forest (RF) multiclass algorithm (Materials and Methods and Fig. 4A).We first split our dataset into training (70%) and test (30%) sets, with roughly comparable proportions of samples from patients with KD, MIS-C, viral infection, or bacterial infection, while also stratifying the groups evenly with respect to hospital of origin and disease subclassification.We performed differential abundance analysis for each pairwise comparison using only the training data (Dataset S5).Next, we trained individual one-vs.oneGLMNET models with LASSO regression for each sample group (MIS-C, KD, viral infection, or bacterial infection), using the top 100 genes identified in the differential abundance analysis.These genes were selected based on gene abundance AUC and level of significance (Materials and Methods).Each of the one-vs.-onemodels demonstrated high performance with individual model scores classifying samples with high accuracy (test AUC: min = 0.86, max = 0.99; test accuracy: min = 0.83, max = 0.92; Fig. 4B and SI Appendix, Table S3).The union of genes used by each GLMNET model with LASSO regression generated a 109-gene panel (Dataset S6).To combine the outputs of the individual models, we trained a multiclass RF classifier using the classification scores from the one-vs.-onemodels.Using this framework, our multiclass machine learning model achieved high accuracy in both the training and test sets (accuracy, train = 100% and test = 80%).The bacterial infection group had the highest rate of misclassified samples, likely due to the smaller sample size and heterogeneity of both pathogen and site of infection (SI Appendix, Table S3 and Dataset S1).
The high performance of the multiclass machine learning model on a relatively small number of genes points to the potential utility of cfRNA in differentiating complex inflammatory conditions in a clinical setting.
Characterization of Disease Using cfRNA.We further investigated whether cfRNA could be employed not only for classification but also for disease characterization.Since inflammation and/ or infection can impact multiple organ systems, understanding organ-specific damage and function is crucial for guiding clinical management.Previous work from our group and others has shown that quantifying the cfRNA cell-type-of-origin (CTO) is a viable noninvasive method to assess cell, tissue, and organspecific injury (17,30).Here, we employed deconvolution of the cell types of origin of cfRNA using BayesPrism and the Tabula Sapiens human cell atlas as a reference.The CTO estimates were compared with known biomarkers and other indicators of specific organ injury or dysfunction (Materials and Methods and Fig. 5A).We first compared hepatocyte and intrahepatic cholangiocyte contributions to the cfRNA in plasma to alanine transaminase measurements (ALT, n = 141).We observed significantly elevated levels of cfRNA from hepatocytes and intrahepatic cholangiocytes in patients with high ALT (ALT > 100 IU/L) compared to patients with normal ALT (ALT < 40 IU/L, Fig. 5A).To explore cardiac function and damage, samples were categorized as having either normal or abnormal cardiac function (Materials and Methods).In the abnormal group, we observed increased levels of cfRNA from cardiac muscle cells and pericytes, likely indicative of increased cardiac cell injury or death.Interestingly, this group also exhibited elevated cfRNA levels from kidney epithelial cells, intrahepatic cholangiocytes, club cells and type I pneumocytes, and bronchial epithelium, suggesting other end organ damage associated with impaired cardiac function (Fig. 5A).Of note, patients with abnormal cardiac function did not have elevated levels of smooth or skeletal muscle cell derived cfRNA (SI Appendix, Fig. S3A).Last, to evaluate lung function and damage we focused on samples from healthy individuals and viral infection patients with COVID-19, further stratifying the COVID-19 samples by disease severity (Materials and Methods).We observed elevated levels of cfRNA from club cells and type I pneumocytes in moderate COVID-19 cases compared to healthy individuals, with a greater increase for severe cases (Fig. 5A).We observed similar trends in the levels of bronchial epithelium derived cfRNA; however, the difference was not statistically significant.
We next assessed whether cfRNA could be used to further stratify sample groups by distinguishing between COVID-19 infection from Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and infections from other viral pathogens [influenza, respiratory syncytial virus (RSV), Epstein-Barr virus (EBV), adenovirus, etc.].We sepa rated viral infection samples into COVID-19 and non-COVID-19 viral infections and randomly split the data into training (70%) and testing (30%) datasets.Next, we trained a GLMNET with LASSO regression model to differentiate between these groups (Materials and Methods ).The trained model had high performance on the training and test sets despite including only seven genes (train AUC = 0.99 and test AUC = 0.93), demonstrating the potential of cfRNA for res piratory viral species differentiation (SI Appendix, Fig. S3 B and C ).
cfRNA Profiling as a Clinical Decision Support tool.We next simulated a cfRNA "report" for each patient in the multiclass test set (Dataset S7) and discuss four patient cases in detail to illustrate how cfRNA can be integrated to support clinical decision-making.Each report includes diagnostic predictions from the multiclass classification algorithm and predicted organ involvement levels as z-scores of CTO fractions in samples from patients with disease compared to healthy patients.
The first case is a 13-y-old male with very-high-risk B cell acute lymphoblastic leukemia in delayed intensification chemotherapy who presented with a 1-d history of fever ( Fig. 5B ).His laboratory assessments were notable for leukopenia, anemia, thrombocyto penia, elevated C-reactive protein (CRP), mildly elevated ALT, normal kidney function, and developed hypotension following admission.He had known household contacts with COVID-19, and his nasopharyngeal PCR was positive for SARS-CoV-2, but he did not develop respiratory symptoms.The final diagnosis was COVID-19 in an immunocompromised host.Despite the immu nocompromised state of the patient and lack of respiratory symp toms common in COVID-19, the cfRNA algorithm correctly identified the patient as having a viral infection.Furthermore, the elevated levels of liver and heart derived cfRNA are interesting given the clinical manifestations of mildly elevated ALT and hypotension.
In the second case, the cfRNA model correctly predicted the patient as having MIS-C, but the classifier score for MIS-C and viral infection were very comparable, producing what could be considered a "borderline" result ( Fig. 5C ).The patient is a 4-y-old male who presented with a 1-wk history of abdominal pain, vom iting, and watery diarrhea, with subsequent development of fever, conjunctival injection, and swelling of the hands and feet.The patient had normal white blood cell counts, with lymphopenia, thrombocytopenia, hyponatremia, acidosis, and elevated brain natriuretic peptide (BNP), CRP, and erythrocyte sedimentation rate, with normal liver and kidney function.He was SARS-CoV-2 PCR negative but SARS-CoV-2 immunoglobulin G positive.No additional viral testing was performed.His course was complicated by hypotension, but an echocardiogram was grossly normal.The final diagnosis was MIS-C, which was the top prediction from the cfRNA model.Interestingly, viral infection was the second most probable diagnosis from the cfRNA model.MIS-C is derived from a viral infection (SARS-CoV-2), and MIS-C and viral infection can be difficult to differentiate.There are at least three possible interpretations for these results: 1) the model is correctly identi fying MIS-C along with a viral signature from SARS-CoV-2, 2) the patient is misdiagnosed with MIS-C and has only a viral infec tion, or 3) the detection of a viral signature is due to errors in the model.We also observed elevated liver cfRNA despite the patient having normal liver enzymes along with elevated heart tissue derived cfRNA, which is consistent with the elevated BNP.
In the third case, the model incorrectly predicted the diagnosis in an immunocompromised patient, but the correct diagnosis of bacterial infection was the second most likely prediction ( Fig. 5D ).The patient is a 16-mo-old male with a history of Wiskott-Aldrich syndrome requiring a bone marrow transplant.He was admitted with fever, fatigue, pancytopenia with severe thrombocytopenia, elevated CRP, and blood cultures positive for Klebsiella pneumoniae .The source of the infection is unknown but was believed to be secondary to bacterial gut translocation or central line infection.His pancytopenia and thrombocytopenia were attributed to his history of Wiskott-Aldrich syndrome.He was treated with cefepime and responded clinically to a full course of antibiotics.The final diagnosis was bacterial infection, which was the second ranked prediction by the cfRNA model.Interestingly, the patient presented with some symptoms characteristic of MIS-C, the apparently erroneous top prediction made by the cfRNA model; specifically with severe thrombocytopenia and elevated CRP, which are part of the MIS-C case definition ( 31 ).
In the last case, the model predicted an incorrect diagnosis, but the results of the multiclass model would have been informative in providing the correct diagnosis ( Fig. 5E ).The patient presented with fever during the COVID-19 pandemic and was initially diagnosed with MIS-C.However, a family member of the patient had recently been diagnosed with KD.Given the known genetic predisposition associated with KD, a reevaluation of the original case was conducted.The patient was subsequently found to meet the American Heart Association's case definition for KD and had initial laboratory values and a subsequent clinical course that were consistent with KD (including periungual desquamation).The patient was reclassified accordingly.Although the multiclass model did not predict the proper diagnosis of KD, the classifier scores for KD, bacterial infection, and viral infection were very similar, while the classifier score for MIS-C was much lower.Furthermore, the MIS-C vs. KD pairwise classifier accurately predicted KD.While the final diagnosis of the multiclass model was incorrect, the multiclass model and MIS-C vs. KD classifier results could have aided the clinicians in determining that the patient did not have MIS-C upon initial admission.

Discussion
In this work, we established a proof of concept that cfRNA can be used to differentiate among inflammatory syndromes in chil dren across both one-on-one and multiclass comparisons.Interest in the use of blood-borne nucleic acids for diagnosis of inflam matory conditions is driven by continual advancements in sequencing technology, the ease of drawing blood, and the critical need for better diagnostic tools.Coote et al. recently introduced whole blood RNA multiclassification models for pediatric illnesses, distinguishing 6 broad and 18 granular categories.However, whole blood RNA biosignatures are primarily derived from immune cells, and do not provide information on cell, tissue, and organ damage.In contrast, cfRNA signals derive from dead and dying cells in blood and tissues, thus providing insight into immune dynamics and underlying tissue involvement ( 8 -10 , 17 -19 , 30 , 32 , 33 ).Kalantar et al. used a combination of cfRNA, cfDNA, and whole blood RNA for differentiating sepsis and other causes of critical illness ( 5 ).However, the approach used by Kalantar et al. lacks consolidation into a unified multiclass model, requires the use of two different analytes (whole blood and plasma), and did not analyze cfRNA to characterize tissue and organ injury.Our work expands on these recent studies by presenting a multi class framework using only cfRNA to differentiate, diagnose, and better characterize inflammatory syndromes in children.
The data from this study provide opportunities to better under stand cfRNA profiles in health and disease.We observed cfRNA signatures that are common to all inflammatory conditions, including elevated histone related RNAs, neutrophil extracellular trap (NET) components MPO and ELANE ( 34 ), as well as immune markers interleukin enhancer binding factor 2 (ILF2), interferon gamma inducible protein 16 (IFI16), CD53, and CXCR2.NETs are com posed of DNA, histones, and other proteins, and act to trap and neutralize pathogens while minimizing host cell damage ( 35 ).Our observation of cfRNA signatures of NET formation is consistent with the reported role of NETs in KD and MIS-C ( 36 , 37 ).
To test the ability of cfRNA profiles to distinguish among pedi atric inflammatory syndromes, we compared cfRNA profiles for patients diagnosed with KD and MIS-C, conditions that are very similar clinically.There is a need for improved molecular tools to discriminate between these two conditions as exemplified by a recent study by Day-Lewis et al. which reported significant overlap in signs and symptoms between KD and MIS-C (based on the 2023 case definition), with an estimated false positive rate of 8% ( 31 ).The cfRNA signature presented here has near-perfect accuracy and there fore has high potential for translation into a useful clinical molecular test.Furthermore, the identified gene signature provides mechanistic insight into KD and MIS-C.Beyond KD and MIS-C, we show that cfRNA can similarly differentiate between COVID-19 and other viral infections, opening the door for future studies to create more granular classification algorithms that can differentiate among path ogens at the species level based on cfRNA host profiling.
In this manuscript, we present multiple gene panels that accurately distinguish similar pediatric inflammatory conditions.Genes exhib iting high diagnostic relevance likely reflect the underlying molecular differences between conditions.For instance, the elevated expression of eukaryotic translation elongation factor 2 (EEF2) in KD suggest potential links to senescence and exposure to bacterial toxins, which is compelling given KD is thought to be triggered by an inhaled toxin or pathogen ( 38 , 39 ).Conversely, the elevated expression of FKBP5 in MIS-C implicates a unique neuronal immune-stress response, which is relevant given the neuronal involvement reported for MIS-C ( 40 , 41 ).Further analysis uncovered markers related to autophagy and EndoMT in the KD vs. MIS-C panel, aligning with previous findings comparing the transcriptome of endothelial cells treated with sera from KD and MIS-C patients.These observations may provide insights into the higher rates of acute cardiac involvement in MIS-C and the long-term arterial wall remodeling observed in KD.Dysregulation of autophagy in MIS-C could potentially lead to endothelial dysfunction, resulting in impaired blood flow to the heart and contributing to the higher rates of cardiac complications observed in this condition.Moreover, a lack of EndoMT pathway activation in MIS-C may account for lower rates of long-term remodeling com pared to KD, despite the higher rates of acute cardiac involvement.
The multiclass gene panel can provide insights into key differ ences between conditions.For example, AKAP12, an A-kinase anchoring protein transcribed in endothelial cells ( 42 -44 ), was more abundant in KD and MIS-C as compared to viral and bacterial infections, and selected by each of the 1v1 models designed to dif ferentiate KD and MIS-C from viral and bacterial infections.Additionally, BNIP3L was elevated in bacterial infection, potentially reflecting its role in mitophagy, while DNMT1, IFI27, SLFN5, and AHNAK were elevated in viral infection, and PABPC4 was elevated in KD.
The case studies focusing on both correct and incorrect predic tions made by the classification model highlight the potential utility of cfRNA profiling in the clinical setting.They also under score the importance of inclusion of as many "real-life" cases as possible during the development of classification models to max imize the specificity of their predictions.In particular, immuno compromised patients exhibit altered immune and tissue/organ damage responses, and more data will likely need to be collected and classifications models tailored to optimize performance of cfRNA profiling in this subgroup.However, we are encouraged that our results suggest that cell, tissue, and/or organ injury based on cfRNA levels can be detected even for those cases for which the model disease classification is incorrect.
Here, we present a multiclass model for differential diagnosis of inflammatory syndromes using plasma cfRNA.The final model includes just 109 genes, suggesting that translating the model to a multiplexed qRT-PCR based platform with a rapid turnaround time of a few hours is likely practical.Furthermore, one of the most compelling aspects of cfRNA profiling is the ability to quan tify the extent of organ involvement.We demonstrate this concept with simultaneous quantification of injury to multiple organ sys tems (liver, cardiac, lung, endothelium).Available clinical tests for evaluating tissue injury (for example, alanine aminotransferase levels in the liver) enabled us to confirm the accuracy and cfRNA-based detection of organ injury.Even more compelling is the potential clinical utility of cfRNA in quantifying injury to tissues where current tests are not available or lack adequate sen sitivity, such as endothelial and neuronal injury.In the future, simultaneous diagnosis and characterization of tissue injury may be critical in informing clinicians on the optimal management and treatment of their patients with inflammatory syndromes of unknown etiology.

Materials and Methods
Ethics Statement.The UCSF Institutional Review Board (IRB) (#21-33403), San Francisco, CA; Emory University IRB (STUDY00000723), Atlanta, GA; Children's National Medical Center IRB (Pro00010632), Washington, DC; and Cornell University IRB for Human Participants (2012010003), New York, NY, each approved the protocols for this study.All samples and patient information were deidentified for analysis and shared with collaborating institutions.At Emory University, the IRB-approved protocol was a prospective enrollment study under which parents provided consent and children assent as appropriate for age.At Children's National Medical Center and UCSF, the IRB protocols were "no subject contact" sample biobanking protocols under which consent was not obtained and data were extracted from medical charts.At UCSD, the IRB reviewed and approved collection and sharing of samples and data (IRB #140220).Signed consent and assent were provided by the parent(s) and pediatric patient, respectively.Sample Acquisition.Samples were acquired from UCSF previously described (17).Briefly, hospitalized pediatric patients were identified as having COVID-19 by testing positive with SARS-CoV-2 real-time PCR (RT-PCR).Residual whole blood samples were collected in EDTA lavender top tubes and diluted 1:1: in DNA/RNA shield (Zymo Research).remaining blood was centrifuged at 2,500 rpm for 15 min and the available plasma was retained.All samples were stored at −80 °C freezer until used.Samples were acquired from Emory and Children's Healthcare Atlanta as described (17).Briefly, pediatric patients were as having COVID-19 by SARS-CoV-2 RT-PCR and as having MIS-C if they met the CDC case definition.Controls were healthy outpatients with no known history of COVID-who volunteered for specimen collection.Whole blood was collected in EDTA lavender top tubes and aliquoted for plasma extraction via centrifugation at 2,500 rpm for 15 min.Samples were stored −80 °C and shipped on ice to either UCSF or Cornell for analysis.Samples were acquired from UCSD prior to any treatment in all subjects in EDTA lavender top tubes and centrifuged at 2,000 g for 10 min.Plasma was collected and stored at −80 °C.Samples were acquired from CNH as previously described (17).Briefly, pediatric patients were classified as having MIS-C if they met the CDC case definition.Whole blood samples were collected and centrifuged at 1,300× G for 5 min at room temperature.Plasma was aliquoted into a cryovial and frozen at −80 °C.
Clinical Data.Patients were stratified as previously described (17).For the purposes of this study, patients were classified as having MIS-C by multidisciplinary teams that adjudicated whether a patient met the CDC case definition of MIS-C.COVID-19 was defined as any patient with PCR-confirmed SARS-CoV-2 infection within the preceding 14 d who did not also meet the MIS-C case definition.KD patients at UCSD met the AHA definition for complete or incomplete KD.Viral and bacterial infection patients enrolled at UCSD were adjudicated and a final diagnosis assigned by the research team (one ED clinician and one pediatric infectious disease expert) 2 to 3 mo after the acute illness to allow time for serologies, recurrence, and clinical recovery to be assessed.Patients with a self-limited illness that resolved without treatment and for whom viral studies were either negative or not done were classified as having a "viral syndrome."Clinical data were abstracted from the medical record and submitted into a REDCap databases housed at UCSF or UCSD.
Sample Processing and Sequencing.Samples were processed described previously (17).Briefly, samples were received on dry ice, RNA was extracted, and libraries were prepared and sequenced on a NextSeq or NovaSeq Illumina sequencer.Reads were trimmed to 61 bp, and sequencing data were processed using a custom bioinformatics pipeline which included quality filtering and trimming, alignment to the human GRCh38 reference genome, and counting of gene features.
Sample Quality Filtering.Using the sequencing data, quality control was performed by analyzing DNA contamination, rRNA contamination, total counts, and RNA degradation.DNA contamination was estimated by calculating the ratio of reads mapping to introns and exons.rRNA contamination was measured using SAMtools (v1.14).Total counts were calculated using featureCounts 30 (v2.0.0).Degradation was estimated by calculating the 5′-3′ bias using Qualimap 31 (v2.2.1).Samples were removed from analysis if either the intron to exon ratio was greater than 3, if a sample had less than 75,000 total feature counts, or if the 5′-3′ read alignment ratio bias was greater than 2.
Differential Abundance Analysis.Gene transcript abundances were compared using a negative binomial model implemented using the DESeq2 R package (45).Gene transcript base mean abundance, adjusted P-value, and log2 fold change were taken from the DESeq2 Results output.Gene transcript AUC was calculated using VST transformed counts and the pROC R package (46).Sample Partitioning.Samples were partitioned for machine learning applications taking into consideration diagnosis, hospital of origin, and disease subclassification.MIS-C and COVID-19 samples were subclassified by severity, as previously defined (17).KD samples were subclassified by phenotypic subclusters, as previously defined (47).Non-COVID-19 Viral and bacterial infection samples were evenly partitioned by diagnosis and hospital of origin only.
Machine Learning: MIS-C vs. KD.Samples were partitioned into training, validation, and test sets at a ratio of 60:20:20, partitioning evenly based on factors such as diagnosis, severity/subgroup, and hospital of origin.Subsequently, differential abundance analysis was conducted on the training data.Genes were filtered based on specific criteria (adjusted P-value < 0.01, base mean abundance > 100, gene transcript AUC > 0.65, and absolute log2 fold change > 0.25).Filtering criteria were chosen with the intention selecting <150 genes.We tested multiple thresholds using our training and validation sample sets and found that the model results were not sensitive to the thresholds chosen.The test set was not used for determining thresholds for gene filtering.
Raw counts for the training, validation, and test sets were individually normalized using variance stabilizing transformation, and subsets were created based on the chosen transcript features.Variance stabilization transformation was performed using the DESeq2 package, and the dispersion function from the training set was used to transform the test and validation sets.Fourteen machine learning classification algorithms were employed using the R package Caret (10.18637/jss.v028.i05),including generalized linear models (GLM) with Ridge and LASSO feature selection (GLMNETRIDGE and GLMNETLASSO), support vector machines with linear and radial basis function kernel (SVMLin and SVMRAD), RF, RF ExtraTrees (EXTRATREES), neural networks, linear discriminant analysis, nearest shrunken centroids (PAM), C5.0 (C5), k-nearest neighbors, naive bayes (NB), CART (RPART), and GLM.Training was performed using fivefold cross-validation and grid search hyperparameter tuning.
For each model, classification score thresholds were determined using Youden's index on the training data.The trained models were then employed to predict labels for the validation set, and performance was assessed using the area under the ROC-AUC.The model achieving the highest AUC on the validation set was selected as the final model and subsequently applied to the test set, which had not been utilized in any phase of model training or selection.Prediction on both the validation and test sets utilized the Youden's index threshold derived from the training set.
Machine Learning: Multiclassification.Samples were partitioned into train and test sets at a ratio of 70:30, considering factors such as diagnosis, severity/subgroup, and hospital of origin.Raw counts for the training and test sets were individually normalized using variance stabilizing transformation, as implemented using the DESeq2 package, and the dispersion function from the training set was used to transform the test set.One-vs.-oneGLMNET LASSO models were trained for each pairwise comparison of samples groups (e.g., KD vs. Viral Infection) using the top 150 significant features (adjusted P-value <0.05, base mean abundance > 50, and absolute log2 fold change > 1) ordered by gene transcript AUC calculated using the training data.Trained models were used to calculate classification scores for all samples in the train and test dataset, resulting in six classification scores for each sample.Classification scores were used to train a multiclass RF algorithm which assigned probability scores for each condition.The final condition with the highest probability score from the RF was assigned as the prediction for each sample.
CTO Analysis.Cell type deconvolution was performed using BayesPrism (v1.1) (48) with the Tabula Sapiens single-cell RNA-seq atlas (Release 1) (33) as a reference.Cells from the Tabula Sapiens atlas were grouped as previously described by Vorperian et al. (30).Cell types with more than 100,000 unique molecular identifiers (UMIs) were included in the reference and subsampled to 300 cells using ScanPy (v1.8.1) (49).Deconvolution values were scaled from 0 to 1 for each cell type, and medians were calculated for plotting.
For liver damage assessment, samples were separated as having normal or high ALT measurements (normal: ALT < 40 high: ALT 100 IU/L).The ALT measurements taken from the same blood draw as the plasma for cfRNA processing.For cardiac function, patients were categorized as abnormal if they had abnormal EKG/ECG and/or echocardiogram results.EKG/ECG and echocardiogram results were categorized as abnormal in the context of the patient narrative and final interpretation of the studies.For endothelial damage, samples were separated as either having KD/MIS-C or bacterial/viral infection.COVID-19 patients were determined to have moderate or severe disease using the following criteria: Moderate: The patient must have been hospitalized due to COVID-19 respiratory disease and/or any systemic/nonrespiratory symptoms attributed to COVID-19 (e.g., neonatal fever, dehydration, new diagnosis diabetes, acute appendicitis, necrosis of extremities, diarrhea, encephalopathy, renal insuffimild coagulation abnormalities, etc.).Severe: The patient must have been hospitalized for COVID-19 with either high-flow oxygen requirement [high-flow nasal cannula, continuous positive airway pressure, bilevel positive airway pressure, intubation with mechanical ventilation, or extracorporeal membranous oxygenation] and/or evidence of endorgan failure [acute renal failure requiring dialysis, coagulation abnormalities resulting in bleeding or stroke, diabetic ketoacidosis, hemodynamic instability requiring vasopressors], and/or dying from COVID-19.These patients were almost always admitted to the intensive care unit (ICU).
Machine Learning: COVID-19 vs. Non-COVID-19 Viral Infection.Training was done using the same method as the KD vs. MIS-C model.Briefly, viral infection samples were partitioned into train and test sets at a ratio of 70:30, considering factors such as COVID-19 status, severity/subgroup, and hospital of origin.Subsequently, differential abundance analysis was conducted on the training data.Genes were filtered based on specific criteria (adjusted P-value < 0.01, base mean abundance > 100, gene transcript AUC > 0.65, and absolute log2 fold change > 0.25), and the top 150 genes, as ordered by gene transcript AUC, were selected as inputs for machine learning analysis.
Raw counts for the train and test sets were individually normalized using variance stabilizing transformation, and subsets were created based on the chosen transcript features.A GLMNET with LASSO regression was trained on the training set using fivefold cross-validation and grid search hyperparameter tuning.Classification score thresholds were determined using Youden's index on the training data.The trained models were then employed to predict labels for the test set using the Youden's index threshold derived from the training set.
Quantification and Statistical Analyses.The programming language R (v4.1.0)was utilized for all statistical analyses.Statistical significance was assessed through two-sided Wilcoxon signed-rank tests and Mann-Whitney U tests, unless specified otherwise.Machine learning algorithms were trained using the Caret R package, and pipelines were run using the Snakemake workflow management system (50,51).In boxplots, boxes denote the 25th and 75th percentiles, the band within the box signifies the median, and whiskers extend to 1.5 times the interquartile range of the hinge.The alignment of all sequencing data was performed against the GRCh38 Gencode v38 Primary Assembly, with feature counting conducted using the GRCh38 Gencode v38 Primary Assembly Annotation (52).
Data, Materials, and Software Availability.Raw sequencing data in this study cannot be deposited in a public repository due to patient privacy concerns and lack of consent for a subset of the patient samples.Instead, deidentified RNA-seq count matrices have been uploaded to the National Center for Biotechnology Information Gene Expression Omnibus database and will be publicly available upon publication (GSE255555) (53).All code has been deposited on GitHub and will be available upon publication.
ACKNOWLEDGMENTS.We thank the patients and their families for their help to further our understanding of pediatric inflammatory conditions.We would like to acknowledge staff members at the UCSF Clinical Laboratories and the UCSF Clinical Microbiology Laboratories for their help in identifying and retrieving patient whole blood samples.We thank the Cornell Genomics Center and the UCSF Center for Advanced Technology for help with sequencing libraries.At Emory, we thank Nadine Baida, Amrita Banerjee, Julia Bartol, Kushmita Bhakta, Caroline Ciric, Khalel De Castro, Khadijah Francois, Theda Gibson, Laila Hussaini, Grace Li, Wensheng Li, Austin Lu, Lisa Macoy, Clair Martin, Molly Morrison, Amy Muchinsky, Heather Nurse, Maria A. Perez, Etza Peters, Brianna Rice, Anna Siaw-Anim, Kathy Stephens, Elizabeth Grace Taylor, Ashley Tippett, Katie Zaks, and the Children's Healthcare of Atlanta Research Laboratory for their contributions to specimen and data collection.).The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Fig. 1 .
Fig. 1.Sample overview.(A) Sample counts and distribution of hospital of origin for each disease group."Other" indicates other hospitalized pediatric controls (B) Age distribution, (C) C-reactive protein (CRP) levels, (D) sex distribution, and (E) ICU status distribution for each sample group.

Fig. 2 .
Fig. 2. Shared cfRNA signatures among different inflammation syndromes.(A) Differential abundance analysis using DESeq2 was performed between healthy controls and each other group individually.Vertical columns indicate the number of overlapping genes that are significantly differentially abundant between groups and controls (BH adjusted P-values < 0.05).Dots below bars indicate the groups being intersected.Horizontal columns indicate the total number of DAGs between groups and controls.(B) Significantly enriched pathways in the set of genes found to be differentially abundant between healthy controls and each disease group.Average P-value and fold change used for pathway analysis (Qiagen, IPA).

Fig. 3 .
Fig. 3. cfRNA distinguishes KD and MIS-C.(A) Overview of sample set and modeling scheme.(B) Volcano plot of differentially abundant transcripts between MIS-C and KD.Analysis was performed using the training dataset (DESeq2).(C) Adjusted P-value, base mean, and gene ROC AUC distributions for all significant genes from the training KD vs. MIS-C comparison.The red dashed line represents the threshold cutoff used for filtering prior to model training.The green shaded area indicates genes used.(D) ROC AUC values for the 14 machine learning classification models applied to training and validation sets.(E) ROC curves of the training, validation, and test sets using the GLMNET with LASSO regression algorithm.(F) Violin plots of the classifier scores from the GLMNET with LASSO regression algorithm.

Fig. 4 .
Fig. 4. Multiclass classification of pediatric disease using cfRNA.(A) Overview of machine learning framework used for multiclass classification.(B) ROC-AUC plots for each one-vs.-onemodel trained, along with train (Top) and test (Bottom) classifier score distributions.(C) Confusion matrix of reference and predicted diagnoses for train and test samples.Color indicates fraction of samples in each category.

Fig. 5 .
Fig.5.cfRNA as a clinical decision support tool.(A) Median scaled cell type of origin fractions for samples separated by liver damage, cardiac function, COVID-19 severity, and endothelial damage, and healthy controls (Materials and Methods).Stars indicate statistically significant differences between groups in comparison (Wilcoxon rank-sum test, BH adjusted P-value < 0.05).(B-E) Case studies of patients with example clinical decision support tool results from the multiclass algorithm, along with measurements of endothelium, heart, liver, lung, and neuronal damage from the deconvolution data.Sample shown in red and healthy donor samples in gray.Z-scores calculated relative to healthy donor sample distributions.Endothelium refers to endothelial cell, heart to cardiac muscle cells, liver to hepatocyte, lung to club cell and type I pneumocyte, and neuronal to Schwann cell cfRNA cell type of origin fractions.