Reduced B cell frequencies in cord blood of HIV-exposed uninfected infants: an immunological and transcriptomic analysis

Introduction In the course of immune development, HIV-exposed uninfected (HEU) infants exhibit abnormal immune function and increased infectious morbidity compared to HIV-unexposed uninfected (HUU) infants. Yet the specific functional phenotypes and regulatory mechanisms associated with in-utero HIV and/or ART exposure remain largely obscure. Methods We utilized flow cytometry and RNA-seq technologies to conduct the immunological and transcriptomic profiling in cord blood from 9 HEU mother-infant pairs and 24 HUU pairs. On top of that, we compared the cord blood dataset with the maternal venous blood dataset to characterize unique effects induced by in-utero HIV and/or ART exposure. Results Flow cytometry immunophenotyping revealed that the level of B lymphocyte subsets was significantly decreased in HEU cord blood as compared to HUU (P < 0.001). Expression profiling-based cell abundance assessment, includes CIBERSORT and ssGSEA algorithm, showed a significantly reduced abundance of naive B cells in HEU cord blood (both P < 0.05), supporting the altered composition of B lymphocyte subsets in HEU. Functional enrichment analysis demonstrated suppressed innate immune responses and impaired immune regulatory function of B cells in HEU cord blood. Furthermore, through differential expression analysis, co-expression network analysis using WGCNA, and feature selection analysis using LASSO, we identified a 4-gene signature associated with HEU status. This signature effectively assesses B cell levels in cord blood, enabling discrimination between HEU and HUU infants. Discussion Our study provides the first comprehensive immunological and transcriptomic characterization of HEU cord blood. Additionally, we establish a 4-gene-based classifier that holds potential for predict immunological abnormalities in HEU infants.

1 Supplemental Methods

Immunological Profiling
A total of 56 blood samples, comprising 9 HEU mother-child pairs and 19 HUU mother-child pairs, were successfully analyzed to construct immune profiles using flow cytometry.These profiles included the composition of 17 immune cell subsets, encompassing mononuclear cells (MONO), dendritic cells (DC), T cells, B cells, and natural killer cells (NK).T cell subsets were categorized into T helper cells (TH), cytotoxic T cell (CTL) and natural killer T cells (NKT), of which TH was subdivided into regulatory T cell (Treg), naïve CD4+ T cells (TH-N), central memory CD4+ T cells (TH-CM), effector CD4+ T cells (TH-E) and effector memory CD4+ T cells (TH-EM), and CTL was subdivided into naïve CD8+ T cells (TC-N), central memory CD8+

T cells (TC-CM), effector CD8+ T cells (TC-E) and effector memory CD8+ T cells (TC-EM) .
Antibody expression across these 17 immune cell types was summarized in supplement Table 4. Flow cytometry was employed to determine the relative abundance of these cell subsets, providing a comprehensive overview of the immunological landscape in HEU and HUU subjects.The analysis involved staining the blood samples with a panel of fluorochrome-conjugated antibodies (supplement Table 3) specific to surface markers characteristic of the immune cell subsets mentioned.Data acquisition was performed using a flow cytometer, and subsequent data analysis was conducted using specialized software to gate and quantify the populations of interest (supplement Figure 2).To assess differences in cell type composition between HEU and HUU groups, we employed the Wilcoxon rank-sum test.This non-parametric test was chosen due to its robustness in handling small sample sizes and non-normal distributions, making it suitable for the comparative analysis of immune cell frequencies.Differences were considered statistically significant at a P value < 0.05.This statistical approach enabled us to identify specific immune cell subsets that were differentially represented between HEU and HUU cohorts.

RNA-seq Data Processing
Bulk RNA-seq was conducted on 38 blood samples from 4 HEU mother-child pairs and 15 HUU mother-child pairs (supplement Table 5).The RNA-seq data processing workflow involved several key steps to ensure accurate alignment, quantification, and normalization of the transcriptomic data.Raw sequencing reads first quality-checked and trimmed to remove adapters and low-quality bases using fastp (0.23.2)(1).The high-quality were then aligned to the reference human genome hg38 (Genome Reference Consortium GRCh38) by Hisat2 (v2.2.1)(2).Following alignment, the gene counts of each sample were generated using featureCounts (v2.0.1)(3).To facilitate comparison between samples, gene counts were normalized to transcripts per million (TPM).TPM normalization accounts for both the sequencing depth and the gene length, allowing for the direct comparison of gene expression levels across different samples.This dataset was subsequently used for downstream analyses, including differential expression analysis, gene ontology (GO) enrichment analysis, and gene set enrichment analysis (GSEA).

Estimation of Immune Cell Abundance
Normalized gene expression profiles were employed to estimate immune cell abundance.CIBERSORT (https://cibersort.stanford.edu/)was utilized to determine the proportion of immune cells based on the LM22 gene signature(4).Meanwhile, quantitative assessment of immune cell composition was carried out using singlesample Gene Set Enrichment Analysis (ssGSEA) based on the marker genes representing 28 specific immune cell types(5), implemented via the R package GSVA (6).CIBERSORT and ssGSEA complement each other in analyzing immune cell proportions by combining the precise cell type identification and individual-level analysis of CIBERSORT with the flexibility and broad applicability of ssGSEA.While CIBERSORT provides accurate estimates of immune cell proportions using reference gene expression profiles, ssGSEA offers relative abundance information without needing predefined reference profiles and is robust across various data types.By integrating the precise quantification from CIBERSORT with the functional enrichment insights from ssGSEA, we can achieve a more comprehensive understanding of immune cell composition and functionality.

Differential Expression Analysis
Gene count files were utilized for conducting differential expression analysis between umbilical cord blood and maternal venous blood samples from both HEU and HUU subjects.This analysis was carried out using the R package DESeq2 (v1.39.8), which employs a model based on the negative binomial distribution (7).DESeq2 normalizes the data and estimates the size factors to control for differences in sequencing depth and RNA composition between samples.Genes exhibiting a log2 fold change (log2FC) greater than or equal to log2(1.5) and a P value < 0.05 were considered differentially expressed.The log2FC threshold of 1.5 corresponds to a 1.5-fold change in gene expression, which is generally considered meaningful in the context of gene regulation.
The P value threshold of 0.05 controls the false discovery rate, ensuring that the results are reliable.

Functional Enrichment Analysis
Gene Ontology (GO) enrichment analysis and gene set enrichment analysis (GSEA) based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) were conducted using the R package clusterProfiler (v4.9.0) (8).The enrichGO function in clusterProfiler was employed, using a hypergeometric test to determine the overrepresentation of specific GO terms among the DEGs compared to what would be expected by chance.Adjustments for multiple testing were made using the Benjamini-Hochberg procedure to control the false discovery rate (FDR), with significantly enriched GO terms identified at an adjusted P value threshold of <0.05.The gseKEGG function in clusterProfiler was used to perform GSEA, calculating an enrichment score (ES) for each gene set to reflect the degree of overrepresentation at the top or bottom of the ranked gene list.The statistical significance of the ES was assessed using a permutation test, with gene sets considered significantly enriched at an adjusted P value <0.05 and an FDR <0.25.This approach identified key KEGG pathways that exhibit coordinated differences between HEU and HUU groups.

Weighted Gene Co-expression Network Analysis
Weighted gene co-expression network analysis (WGCNA) was employed to identify co-expression modules using the R package WGCNA (v1.72-1) (9).The analysis involved several steps to ensure the robustness and relevance of the detected coexpression modules.Firstly, the expression matrices were normalized to transcripts per million (TPM) to facilitate comparison between samples.We selected 10,150 genes for network construction according to mean and variance criteria (MEAN > 1, VAR > lower quartile).These criteria ensured the inclusion of genes with sufficient expression levels and variability, which are essential for reliable network construction.
To build the co-expression network, we applied WGCNA, setting the soft threshold power to 18.This threshold was chosen because it achieved a scale independence reached 0.76 and the average connection value of 30.53, which are indicative of a robust network.The sensitivity for module detection was set a medium level of 2.0, ensuring a balance between detecting distinct modules and maintaining module robustness.The cut height for merging modules was set to 0.1, allowing closely related modules to be combined, thereby reducing redundancy.Additionally, we set the minimum module size to 150 to focus on biologically significant modules while filtering out smaller, potentially less relevant ones.
Finally, modules whose eigengenes (the first principal component of the module expression data) were correlated above 0.9 were merged.This step ensured that highly similar modules were consolidated, enhancing the biological interpretability and stability of the identified co-expression modules.This comprehensive approach allowed us to detect robust co-expression modules, which were further analyzed to uncover their biological significance and potential roles in the immunological differences observed between HEU and HUU subjects.