Introduction

Various immunologic and virologic factors influence perinatal transmission; but their relative contributions are difficult to assess, suggesting that transmission is multifactorial.1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 Nevertheless, the use of antiretroviral drugs, based on early results from the PACTG 076 Study,15 has substantially reduced the number of perinatally acquired AIDS cases in the US. A significant challenge, however, still remains within Africa and other resource-limited locales, wherein approximately 700 000 new infections are estimated to have occurred among children in 2003 (WHO). Southern Africa has the highest prevalence of human immunodeficiency virus (HIV)-1 infection in the world (http://www.unaids.org). The predominant HIV in this region is subtype C (HIV-1C).16 A serosurveillance study conducted in 2003 among pregnant women in Botswana indicated that 37.4% were infected and that among women aged 25–29, 49.7% were infected.17

The role of maternal viral load as a strong predictor of perinatal transmission outcome has been well established, although we and others have observed that there is a substantial overlap in the detectable range of plasma and genital fluid associated viral load between those who transmit virus and those who do not.18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33 In many cases, a clear threshold of virus has not been identified above, which perinatal transmission is 100% nor has a threshold been consistently identified below which transmission does not occur.19, 27, 32, 34

Relatively few studies have examined the role of maternal genetic factors on transmission, and have generally been limited to coreceptor/ligand polymorphism.11, 35, 36 Notably, previous studies have implicated a role for the chemokine/coreceptors SDF1/CXCR4 and RANTES/CCR5 in HIV-1 disease progression, and polymorphisms in SDF1 and CCR5 have been associated with perinatal transmission.36, 37, 38 Also, cellular immune response factors have been previously hypothesized to influence perinatal transmission, specifically interleukin (IL)-10 and interferon γ (IFN-γ).39 These genetic studies to date have tended to focus on a small number of candidate genes and have not taken advantage of the genomic survey approach that is possible with the use of gene-expression profiling. Such profiles have been used successfully to investigate HIV infection in US subjects.40, 41 However, no such data exist among African subjects, which account for most infections worldwide. Nor is it known whether gene-expression profiles are associated with transmission outcome.

Human immunodeficiency virus infection has been previously associated with a complex set of host–virus interactions that variably contribute to influence overall host response.42, 43 The considerable genetic variation present within the human population allows for the possibility of differential host effects on viral replication and immune response.44, 45, 46 To date, many polymorphic genes predominantly within US-based populations have been described that influence HIV disease progression and infection (termed ARGs: AIDS restriction genes) and they include CCR5,47, 48, 49 CCR2,50, 51 CCL5/RANTES,52 CXCL12/SDF-1,53 CXCR6,54 CCL2/MCP-1, CCL7/MCP-3, CCL11/Eotaxin,55 IL-10,56 IFN-γ,57 HLA58, 59, 60 and KIR3DS1,61 and more recently, the RNA editing gene APOBEC3G.62 (see review by O'Brien and Nelson63).

The present study was undertaken to survey a broad array of gene probes (22 277), to identify genes that were differentially expressed during HIV-1C infection and to determine whether the expression differences could also be associated with perinatal transmission outcome. We examined the relation between gene-expression profiles and mother-to-infant transmission of HIV-1C among women and infants that were cross-sectionally identified within Botswana.

Results

Sampling for microarray analysis was representative of initial population

Table 1 compares the viral load statistics for the specimen groups used for microarray analysis with the previously described maternal HIV-1 infection and transmitter status larger data set.18 The groups tested had similar viral loads, and were similar in age and nationality (all citizens of Botswana). The specimens consisted of HIV+ mothers (n=25), including a subset of transmitter (TRs) mothers, TRs (n=11), nontransmitter (NTRs) mothers, NTRs (n=14) and an HIV− control population of mothers (n=20). TRs, NTRs and seronegative controls did not differ significantly with regard to location, age, clinical status or condition, parity, Cesarean section experience, breast-feeding practice, or prevalence of sexually transmitted diseases (Montano et al18 and data not shown). The mean plasma viral load for microarray subsets were 4.22±0.89 for NTRs and 4.70±0.68 for TRs. This sampling was representative of a larger previously described data set,18 wherein mean viral load for NTRs=4.34±0.81 and TRs=4.78±0.69.

Table 1 Descriptive statistics

Maternal gene expression profiles reflect infection and transmission status

A statistical method implemented in the program BADGE64, 65, 66 was used to identify differentially expressed genes associated with infection or maternal transmitter status in four grouped comparisons: HIV− vs HIV+ (group 1); HIV− vs TR (group 2); HIV− vs NTR (group 3) and HIV NTR vs TR (group 4). Differentially expressed genes within these groups were then subjected to gene category over-representation analysis (termed ‘enrichment’) using a standalone version of EASE67 that contains annotated gene-sets available in the Gene Onotology (GO) database (see Methods) and additional literature query based gene-sets, as described.68 Enrichment of gene categories was further annotated into biological themes and plotted for each comparison group based on the enrichment significance score (Figure 1, note that all categories shown were significant, that is, below P=0.05). See also Supplementary Figures 1 and 2 for the entire list of significant categories with associated dendrograms (http://idisk.mac.com/montyandalan-Public/HIV-neg-pos-analvsis/web-supplement/index.html).

Figure 1
figure 1

Summary of enriched gene categories in groups 1–4. Adjusted Fisher's exact probabilities for the top 10 prominently ranked enriched biological categories within each comparison of mothers in Botswana, for groups 1–4 are shown. Note that all categories had significance values at P0.05 (see Rarick et al105 and Methods section).

Expression profiles for maternal infection with HIV-1C differed from the seronegative reference subjects (group 1 comparison) in enriched gene-sets specifically associated with IMMUNE RESPONSE (P=1.38E-09) and enriched gene-sets specifically associated with RNA (mRNA metabolism including processing and editing, P=3.46E-06). The immune response categories included many genes that overlapped with other significant categories including ANTIVIRAL (P=0.0001) and INTERFERON (P=0.0038). The profile of biologically enriched categories also differed significantly between TR and NTR mothers compared with HIV− controls. For example, the enriched RNA categories were predominant in HIV-1C TR mothers and markedly distinguished this group of subjects from the control subjects (group 2 comparison, P=4.70E-06). By contrast, the enriched IMMUNE RESPONSE categories were predominant in HIV-1C NTR subjects in comparison with controls (group 3 comparison, P=1.41E-12). The comparison of HIV-1C TRs with HIV-1C NTRs (group 4 comparison) resembled the comparison of HIV-1C TRs with HIV−, although the magnitude of significance for RNA categories was reduced, possibly due to sample size (compare P=0.0003 for group 4 with P=3.46E-06 for group 1 comparison). As described in the Methods section, estimates of significance were based on two steps: the first was to determine the probability for differential expression in each group comparison. The second step was then to evaluate the ‘enrichment’ for differentially expressed gene-sets (biological categories) using a modified Fisher's exact test. The results are shown graphically in Figure 1, for categories with significance greater than P=0.05 in groups 1–4 comparisons.

Human immunodeficiency virus-1C infection increased expression of immune response genes associated with Toll-like receptor and interferon γ pathways and RNA processing genes associated with interferon response

Based on the significance score for biological categories identified in each comparison group shown in Figure 1, and to visualize trends in gene expression, profiles for all differentially expressed gene-sets were systematically converted into heatmaps, and representative gene expression data from selected heatmaps were displayed as boxplots (for complete set of heatmaps for each comparison group, see Supplementary Figure 3: http://idisk.mac.com/montyandalan-Public/HIV-neg-pos-analysis/web-supplement/index.html). As IMMUNE RESPONSE and RNA categories were prominent in the HIV+ vs HIV− subjects (group 1, see Figure 1), representative heatmaps for selected gene-sets (IMMUNE RESPONSE, INand mRNA METABOLISM) within these categories are shown (Figure 2a–c). Many genes within the IMMUNE RESPONSE and INTERFERON categories were upregulated, whereas most genes within the mRNA METABOLISM category were downregulated in association with HIV-1 infection. Representative boxplots for specific genes in these categories are shown with their probe-ID and significance estimates (posterior probability, see Methods and Figure 2d) with notable members of the interferon pathway (STAT1, IRF7), the Toll-like receptor (TLR) pathway (MYD88, RANTES), and the RNA processing and editing pathway (ADAR, AP0BEC3G) (Figure 2d and data not shown).

Figure 2
figure 2

(ad) Group 1 comparisons (HIV− vs HIV+). Heatmaps for Gene Ontology (GO) categories: (a) Immune response; (b) Interferon; (c) mRNA metabolism. (d) Boxplots for representative genes modulated in association with HIV-1C infection among mothers in Botswana. Probability scores represent the posterior probability based on a Bayesian model for differential expression between the two conditions (HIV pos, HIV−) and by an odds-ratio (OR). Note that in heatmaps, red is downregulated and green is upregulated. See supplementary figures for detail on heatmaps and full annotation.

Transmitter mothers displayed a broad reduction in RNA processing genes, except for antiviral RNA associated genes

The most prominent categories identified in the group 2 comparison (TR vs HIV− control subjects) were RNA associated gene-sets (e.g. mRNA metabolism, P=1.36E-07) with most genes downregulated. Similarly, most genes in this category were also downregulated in NTR subjects, compared with the HIV− subjects. A notable exception within this categorical comparison was a subset of genes upregulated in association with RNA binding activity (P=2.3E-09) and interferon induced antiviral response. Interferon induced RNA response genes included OAS1-3, OASL, ADAR and MX1. The OAS genes encode essential proteins involved in the innate immune response to viral infection. These molecules activate latent RNase L, which results in viral RNA degradation and the inhibition of viral replication. These specific genes tended to be upregulated in NTR subjects compared with HIV− controls (see Supplementary Figure 3, heatmap for group 3, RNA binding category http://idisk.mac.com/montyandalan-Public/HIV-neg-pos-analysis/web-supplement/index.html), but as a category did not reach significance in the group 3 comparison – due, in part, to the presence of fewer genes within each category and the low expression trend difference in comparison with HIV− control subjects.

Nontransmitter mothers displayed a more robust immune response profile than transmitter mothers, particularly in genes associated with antiviral and interferon activity

The most prominent biological categories identified in the group 3 comparison (NTR vs HIV− subjects) were IMMUNE RESPONSE (P=2.78E-10), and ANTIVIRAL (P=5.28E-06), with most genes upregulated. There was a notable absence of significant RNA processing categories in the group 3 comparison, due to a lack of sufficient differential expression among RNA gene-sets in the seronegative controls compared with NTRs (this was in contrast with the group 2 (TR vs HIV−) comparison, see Figure 1). The ANTIVIRAL genes induced included CCL5, and several interferon induced antiviral RNA response genes including MX1, OAS1-3, IFI35, IFI44, PRKR. Representative group 3 upregulated genes in IMMUNE RESPONSE (P=2.78E-10) included innate response and RNA editing genes such as IRF7, CCR5, MYD88, ADAR and CCL4. The tendency for most genes to be upregulated in this category (group 3) accounted for the higher significance, in contrast with the group 2 comparison (TR vs HIV−) with IMMUNE RESPONSE significance lower at P=6.16E-06.

Nontransmitter mothers displayed altered expression of RNA processing and splicing associated genes and displayed two expression subclusters that differed by viral load

In addition to evaluating NTR and TR profiles to a negative control population, they were compared directly to each other. Although the sample size was small (14 NTR vs 11 TR), significant categories were also identified in the TR vs NTR (group 4) comparison, implicating RNA associated gene-sets representing RNA processing and splicing. Although the gene-sets clustered together, we noted that the expression profile for the NTR subjects (when compared to TR subjects) contained two subsets of specimens. Figure 3a shows the NTR subsets, termed NTR-hi and NTR-lo, which were correlated with expression trends in a heatmap for the category RNA PROCESSING. The subsets differed in their viral load range, with one subset exhibiting a relatively lower viral load (NTR-lo, mean VL=3.88 logs) and the other subset exhibiting a significantly higher viral load (NTR-hi, mean VL=4.84 logs) (Figure 3b). The NTR-hi expression profile resembled levels present in TRs (mean VL=4.70) in many (but not all) gene-sets evaluated in the group 4 comparisons. Therefore, gene expression levels for some categories within NTRs appeared to differ in association with viral burden and these categories tended to be related to RNA processing functions. In contrast with NTRs, the TR subset of specimens did not appear to contain gene expression subsets that differed significantly based on viral burden (data not shown).

Figure 3
figure 3

(a, b) Identification of group 4 genes associated with viral load. (a) Heatmap for RNA processing gene set, with the HIV-1C NTR viral load subsets (Lo, Hi) indicated below. (b) Plasma viral load boxplots for HIV-1C TR subjects, HIV-1C NTR subjects, the NTR-hi, and the NTR-lo clusters. Plasma viral load is log transformed to base 10. P-values are shown for both a t-test and odds ratio with equal variance.

Correlation between increased maternal gene expression and reduced viral burden

The observation of NTR subsets associated with viral load prompted us to directly evaluate gene-sets that displayed differential expression in association with viral load. Significant enrichment for gene-sets was observed for both groups 1 and 3 comparisons. For the group 1 comparison (HIV-1+ vs HIV−), representative categories associated with viral load were generally consistent with broad innate response (see Supplementary Figure 4 for all categories with P-value less than 0.05 and associated heatmaps http://idisk.mac.com/montyandalan-Public/HIV-neg-pos-analysis/web-supplement/index.html) and included gene-sets associated with interferon signaling (e.g. HEMATOPOIETIN/IN P=0.0002) and subcategories for IMMUNE RESPONSE (defense response (P=0.0043) and response to biotic stimulus, P=0.0067). Notably, the genes within this category did not entirely overlap with genes based on serostatus and transmitter status. For the group 3 comparison (NTR vs HIV−), gene-sets that were enriched in association with viral load also supported a role for innate response (e.g. HEMATOPOIETIN/IN P=0.0013). There were no detectabe enriched gene-sets for the group 2 and 4 comparisons when evaluated directly in relation to viral load. This may be due, in part, to the limited sample size available in those comparisons.

Specific gene validation by real-time polymerase chain reaction

To validate expression, unique genes within IMMUNE RESPONSE enriched categories were measured for quantitative RNA expression using real-time reverse transcriptase-polymerase chain reaction. To accomplish this, specimens were chosen based on available RNA and included 6 HIV− and 19 HIV+ specimens that partially overlapped with specimens used in the microarray assessment (four of six HIV− and nine of 19 HIV+). Results were normalized to endogenous 18S RNA levels to control for RNA quantity and are shown in Figure 4. Log-transformed differences between HIV− and HIV+ levels were evaluated by t-test and were highly significant: ADAR (P=0001), APOBEC3G (P=0078), MX1 (P<0.0001), IRF7(P<0.0001), MYD88 (P=0002), RANTES (P<0.0001) and STAT 1(P=0001).

Figure 4
figure 4

Real-time polymerase chain reaction validation of host response genes. Box and whisker plots are shown for results from quantitation of selected genes in HIV− and HIV+ mothers in Botswana. P-values were all 0.05 (see text).

Enriched category validation with independent human immunodeficiency virus and severe acute respiratory virus infection data sets

Because we were unable to identify a second population of drug naive subjects in Botswana (based on existing ethical guidelines); we therefore chose to independently validate the host profiles observed in this study by asking whether the enriched biological categories representing the HIV signature among peripheral blood mononuclear cells (PBMCs) in this study were comparable to other PBMC-based profiles for infection. To this end, we identified two studies: one comparing HIV-1 infection in PBMCs with healthy controls in a US Army cohort40 and a second study evaluating host response in PBMCs to an unrelated pathogen, that is, severe acute respiratory virus (SARS).69 Although the HIV-1 retrovirus and the SARS coronavirus are both plus-strand RNA viruses, their mode of infection, viral life cycle and pathogenic sequelae are distinct (for a review see Montano and Williamson70 and Ziebuhr, 71). All three data sets were analyzed using the exact same overlapping gene list (8793 gene probes), namely, our HIV infection data set (set 1), the US Army HIV infection data set (set 2) and the SARS infection data set (set 3). As shown in Figure 5, IMMUNE RESPONSE categories for HIV infection (both in the Botswana and the US Army data sets) included the same top four categories (top 20 are shown) in contrast with the RNA associated categories that were predominant in the SARS infection data set. This supports the view that HIV infection in distinct data sets elicited many (but not all) of the same enriched category gene-sets in contrast to host response to infection with a distinct viral pathogen (i.e. SARS).

Figure 5
figure 5

Enriched category validation in an independent HIV-1 data set, but not in SARS. Shown are the top 20 biological categories enriched in two independent HIV infection data sets and in SARS infection data set. Overlapping gene probes (8793) were identified for our HIV+/HIV− Botswana data set, a US Army HIV+/HIV− data set and an SARS+/SARS− data set. The top 20 categories representing enriched gene sets are shown for the Botswana data set (set 1: 25 HIV+ vs 20 HIV− controls), the US Army data set (set 2: 22 HIV+ vs 12 HIV− controls, GEO series 2171) and the SARS data set (set 3: 8 SARS+ vs 4 SARS− controls, GEO series GSE1739). Note that the use of a 8793 focus chip in set 2 and set 3 represent a subset of the 22 777 gene probes initially evaluated in this study, as shown in Figure 2.

Discussion

We identified patterns of expression for specific gene-sets that were related to infection status, transmission outcome and viral burden. In this study, HIV-1C infection appeared to be associated with the differential expression of multiple gene-sets representing a broad innate response, characterized by an activation of TLR, interferon and antiviral RNA response pathways. These response pathways are functionally related.72, 73 Recent studies indicate that TLRs trigger interferon associated genes (e.g. IFN-α/β) to initiate adaptive immunity by providing a link between the innate and adaptive immune response to infection,74, 75 with subsequent influence on the expression of co-stimulatory molecules (e.g. CD80/86, class II),76 CTL/CD8+ T cell effector activity77, 78, 79 and antigen presentation.80

Our data indicate that HIV-1C infection was associated with a differential expression of innate response genes in the TLR pathway, including IL-1A, MYD88, RIP2/RIPK2, IRAK3/IRAKM, TRIF/TICAM1, NFKB2 and IP-10/CXCL10. Upregulation of the adaptor protein MYD88 and TRIF suggested that both MYD88 dependent and MYD88 independent (TRIF mediated) pathways are engaged in HIV-1C infection. Members of the TLR family of receptors mediate innate immune response to a broad range of microbial ligands via activation of members of the REL, IRF and STAT transcription factor families and their respective target genes. MYD88 dependent effectors include proinflammatory and chemotactic cytokines, whereas the MYD88 independent effectors are associated with type I/II interferons and stimulation of the JAK-STAT pathway. Studies in vitro have shown that HIV-1 RNA can activate TLR signaling81 and that microbial TLR engagement can activate HIV-1 transcription82, 83 and proinflammatory chemokine release.84

HIV-1C infection was also associated with the differential expression of interferon-stimulated genes (e.g. STAT1, TRIM22, MX1, ISGF3G, IRF2, IRF7, IFI27, CXCR3 and PRKR. Interferons are a family of proteins produced in response to viral infection (notably RNA viruses) and/or microbial activation through TLRs and various other cytokine signaling pathways, including RNA degradation and editing responses. Interferon ligand–receptor interactions stimulate JAK-STAT signaling that induce various IRFs, which in turn upregulate host chemotatic effector genes (e.g. IP-10, MIG, I-TAC, MCP-1) and multiple antiviral RNA response effectors.85 STATs are activated by multiple cytokines and interferons (e.g. IFN-α/β, IFN-γ).86 Multiple STATs are activated by HIV-1 infection in vitro87 and chronic HIV-1 infection in vivo.88 Mechanistic studies in vitro also implicate a role for IRFs in HIV-1 expression through the HIV-1 long terminal repeat (LTR)89, 90, 91 and the HIV-1 transactivator protein pTAT,92 suggesting that host induction of interferon and antiviral RNA response may be beneficial to the virus by influencing replication. Interferon stimulated genes in the peripheral blood have also been detected in acute infection using an SIV/HIV-1 chimeric virus, SHIV89.693 and have been detected in lymph node biopsies from HIV-1 infected subjects.41

Our data also indicated a differential expression of interferon associated antiviral RNA response genes (e.g. MX1, PKR, OAS, ADAR, AP0BEC3G). Many of these genes are associated with type I/II interferon response (for a review see Samuel94) and may influence transmission.95 Activation of these genes are often associated with interferon-induced response to RNA viral infection. However, the upregulation of the RNA editing gene APOBEC3G that we observe (see Figures 2 and 4) has not previously reported. However, an examination of the distribution of predicted regulatory elements within the promoter region for APOBEC3G suggests the presence of multiple IRF binding sites (data not shown). Initial in vitro studies of HIV-1 infected cell lines did not show activation of APOBEC3G expression, potentially suggesting differences between cell types or in vitro/in vivo differences. Interestingly, APOBEC3G has been associated with G-to-A hypermutations of coding sequence for viral and cellular genes and restricts viral replication, in the absence of the HIV-1 vif gene.96 Hypermutation of transmitted HIV sequences implicating RNA editing activity has been noted among newborns in Tanzania.97 The activation of APOBEC3G and other antiviral RNA response genes may in part represent an ancient innate response to invading viral RNA that is engaged in addition to adaptive immunity.98

Also implicated in these profiles were a subset of genes coding for SR proteins associated with RNA splicing, notably in the TR subset compared with HIV− and in the two subsets of NTR subjects. Some of these proteins have been previously associated with HIV-1 RNA splice-site selection in vitro,99 and evidence for activation of different SR proteins has also been described during HIV infection in vitro100, 101 and in vivo.102 The relative abundance of SR proteins may influence the pattern of viral and host gene expression to influence local viral production and potentially transmission likelihood. Interferon induced RNA editing and degradation has been linked to SR protein activity.103 It is unclear to what extent the differential expression of genes associated with antiviral response represent a host limitation on viral replication or promote viral replication. The mechanism(s) that promote and regulate HIV-1 RNA processing and alternative splicing configurations need to be directly studied in relation to viral replication to better understand their potential role in transmission outcome. We speculate that during infection in vivo that there may be increased antiviral and RNA editing gene expression that, in turn, influence the ratio of various splicing factors (by influencing SR protein activity), thereby shifting the host and viral transcriptome in favor of viral replication. The outcome of this process may contribute to increased transmission likelihood.

In this study, the role of viral load in explaining the differential expression of gene-sets was examined. Interestingly, most genes identified in association with HIV-1 infection and/or transmission were not associated with viral load, although a subset of genes associated with immune response/interferon were present in correlation with viral burden and NTRs also exhibited two subsets of RNA processing genes that differed in relation to viral load. Collectively, this may suggest that infection induced a broad innate immune response that was sensitive to infection but was predominantly insensitive to viral burden in the peripheral blood. Alternatively, innate response genes and viral burden are more closely related in specific cell subsets or in different biological compartments and are not apparent in PBMCs, which represent mixed cell subsets. Overall, the findings in this study document that HIV-1C infection and transmission status were associated with the expression of different functional groups of genes that form a bridge between the innate and adaptive immune response. Our findings point to specific gene-sets associated with innate immune response, RNA processing and splicing, antiviral RNA response. Furthermore, the presence of common features of host response induced during HIV-1 infection in different settings, despite ethnographic, gender and viral subtype differences, and in contradistinction with other infections (see Figure 5), seems promising. These data raise enthusiasm for the potential of utilizing gene profiling to detect and characterize pathogen-specific host responses. Direct evaluation of specific gene role(s) in local infection, and expression monitoring of these genes in at risk subjects, may help augment efforts to both understand pathogen-specific host response and intervene in viral–host mechanisms engaged during both HIV infection and perinatal transmission.

Methods

Human subjects

The study population consisted of a cross-sectional group of 20 HIV− Botswana mothers with a mean age=27 (ranging from 18 to 38) and 25 HIV-1+ Botswana mothers with a mean age=25 (ranging from 17 to 44) living within four different study sites. We have previously described the viral burden of the HIV-1+ subjects in relation to transmission outcome.18 The HIV+ mothers who participated were not identified for the study until after their infants were 2–5 months old. The HIV-1 seronegative subject specimens were collected through co-enrollment in an ongoing substudy in 2003–2004 using the exact same specimen collection method (see below and Montano et al.)18 TRs, NTRs and seronegative controls did not differ significantly with regard to age, clinical status or condition, parity, Cesarean section experience, breast-feeding practice or prevalence of sexually transmitted diseases (Montano et al.18 and data not shown). All subjects were asymptomatic, but CD4 counts were unavailable at the time of specimen collection. Informed consent was obtained from all participating subjects. The Botswana Health Ministry, as well as the institutional review boards at the Harvard School of Public Health and the Boston University School of Medicine, approved this study.

Specimen collection and RNA processing

All PBMCs were collected from mothers and infants, as described.18 Approximately, 1 ml of blood was placed into a cryovial tube containing 4 ml of RNA/DNA stabilizing reagent (Roche), inverted and stored at −80°C. Peripheral blood mononuclear cells were processed to obtain total nucleic acid using a modified protocol of the mRNA isolation kit for Blood (Roche), then processed for RNA isolation using the RNeasy extraction kit (Qiagen, Valencia, CA, USA). Trace DNA was removed from the RNA samples using the DNA Free kit (Ambion, Austin, TX, USA) and the RNA concentrations were determined using the NanoDrop-1000 (NanoDrop Technologies, Rockland, DE, USA). Five micrograms of each total RNA specimen was provided to the Boston University Microarray Facility for labeling, amplification and hybridization to a U133A 2.0 chip from Affymetrix (Santa Clara, CA, USA). Hybridization signals were read using an Affymetrix Genechip Scanner 3000 and processed with the statistical software GCOS v1.2.1, and raw intensity values were scaled to a target=500. The 22 777 gene probes were filtered based on at least 25% present calls resulting in data for 11705 gene probes. The specimens utilized for microarray analysis were a representative random sampling based solely on specimen availability and RNA quantity.

Real-time polymerase chain reaction validation

Human ADAR, APOBEC3G, MX1, MYD88, STAT1, RANTES, IRF7 and 18 s rRNA (endogenous control) were measured using Assays-on-Demand (AoD) from Applied Biosystems Inc. (Foster City, CA, USA). Fifty nanograms of each sample RNA was used per reaction in duplicate and were normalized to each sample's corresponding 18s fluorescence value. Normalized values were log-transformed (due to skewing in some values) and then P-values for significance were evaluated using a two-sample Student's t-test.

Enriched category validation

Two independent PBMC profile Gene Ontology (GEO) data sets were identified (GSE2171 and GSE1739) representing an HIV-1 infection series and an SARS infection series, respectively. Since both of these data sets used a smaller focused chip (8793 gene probes) than the chip used in this study (22 777 gene probes), we identified the overlapping gene probes (8793) present in our (this study) HIV+/HIV− Botswana data set, the US Army HIV+/HIV− data set (GSE2171) and the SARS+/SARS− data set (GSE1739). BADGE and EASE analysis was conducted on each infection/control series for each of the three data sets, as described in the statistical analysis section below. The top 20 categories representing enriched gene sets are shown. The entire category list (200 categories) with significance greater than P=0.05 (adjusted Fisher's exact) is available in the web supplement. The Affymetrix data sets can be accessed under the GEO accession number GSE4124.

Statistical analysis

The arrays data sets were analyzed for differential expression based on HIV status (seronegative, seropositive) and transmitter status (TR; NTR) using BADGE (Bayesian Analysis of Differential Gene Expression) version 1.0, a computer program implementing a Bayesian approach to identify differentially expressed genes across experimental conditions.64, 65, 66 The statistical procedure is described in detail in Supplemental Methods. The algorithm is also described in Klings et al64 and Sebastiani et al66, 104 and is available from (http://people.bu.edu/sebas/software.htm). The differential expression of each gene in two conditions is estimated by the fold change and measured by the probability that the fold change exceeds a fixed threshold, conditional on the data. To compute this probability, BADGE uses model averaging to gain robustness over model misspecifications. The current implementation of BADGE uses two models for the gene expression data: log-normal and gamma distributions. By combining both models, BADGE gains robustness and reproducibility over simpler analyses. In the analysis we used an expected false positive rate of 1%, and choose those genes that changed expression by at least 1.5-fold.

Once differentially expressed genes were identified by BADGE, the biologically enriched categories were identified, as recently described,105 by implementing a stand-alone version of the EASE statistical software.67 This program computes a modified Fisher's exact probability score for observing the frequency of biological category associated with a variable (infection, transmission), compared with the likelihood of seeing that category by chance given the total number of gene probes in the data set. An adjusted score is then reported representing the upper bound of the distribution of Jackknife Fisher exact probabilities for observing an enriched biological category. For more detail, see Hosack et al.67