Utility of the Blood for Gene Expression Profiling and Biomarker Discovery in Chronic Fatigue Syndrome

Chronic fatigue syndrome (CFS) is a debilitating illness lacking consistent anatomic lesions and eluding conventional laboratory diagnosis. Demonstration of the utility of the blood for gene expression profiling and biomarker discovery would have implications into the pathophysiology of CFS. The objective of this study was to determine if gene expression profiles of peripheral blood mononuclear cells (PMBCs) could distinguish between subjects with CFS and healthy controls. Total RNA from PBMCs of five CFS cases and seventeen controls was labeled and hybridized to 1764 genes on filter arrays. Gene intensity values were analyzed by various classification algorithms and nonparametric statistical methods. The classification algorithms grouped the majority of the CFS cases together, and distinguished them from the healthy controls. Eight genes were differentially expressed in both an age-matched case-control analysis and when comparing all CFS cases to all controls. Several of the diffrentially expressed genes are associated with immunologic functions (e.g., CMRF35 antigen, IL-8, HD protein) and implicate immune dysfunction in the pathophysiology of CFS. These results successfully demonstrate the utility of the blood for gene expression profiling to distinguish subjects with CFS from healthy controls and for identifying genes that could serve as CFS biomarkers.


Introduction
Chronic fatigue syndrome (CFS) is an illness characterized by debilitating fatigue, impaired concentration and memory, sleep disturbances, and pain [8] and it affects approximately 500 per 100,000 adults in the United States [15]. CFS presents a unique challenge for health care providers, public health officials, and patients because the diagnosis is based on self-reported symptoms and requires exclusion of medical or psychiatric diseases that could potentially explain the illness. Once all other medically explainable illnesses have been excluded, an individual with CFS is diagnosed by having 6 months or greater persistent or relapsing fatigue and at least four concurrent symptoms including: impaired memory, sore throat, tender lymph nodes, muscle pain, joint pain, new headaches, unrefreshing sleep, and post-exertional malaise [8]. Some features of CFS resemble diseases associated with chronic infection [3,6,13,17,23], immunologic perturbation [16,18,22], and neuroendocrine disorders [12] but to date, no one etiology or pathology has been defined.
One reason that CFS remains an enigma to both medical and research communities is lack of a known or accessible anatomic lesion. We hypothesized that peripheral blood would serve as a representative sample of the systemic state allowing for evaluation and profiling of multiple pathologic and physiologic pathways. Peripheral blood is an easy and noninvasive sample to collect. Demonstration of the utility of the blood for pro-filing and biomarker discovery in CFS has implications for other medically unexplained diseases. Our purpose was to demonstrate that peripheral blood could serve as a sample for gene expression profiling. We selected a small but well-characterized group of CFS cases and controls and examined the expression of 1764 genes. The approach successfully distinguished the majority of CFS cases from controls and demonstrated the utility of peripheral blood for identifying biomarkers as well as shedding light on etiologic pathways.

Study subjects
Subjects and study design have been presented in detail elsewhere [18,20]. In brief, the original study was a case/control design that included subjects with CFS defined by the 1988 case definition [14] and age, race, and sex matched controls selected randomly from the Atlanta, Georgia metropolitan area. Five CFS cases and seventeen controls had remaining cryo-preserved PBMC aliquots available for gene expression analysis with cDNA arrays; all these subjects were white women age 24 through 51. The five CFS cases had been ill from 2.5 to 6.5 years. Human experimentation guidelines of the U.S. Department of Health and Human Services were followed in the conduct of this study. All study participants were volunteers who gave informed consent.

Specimens
Specimen collection has also been described [18]. Briefly, cases and controls had blood samples collected before 10 a.m. in vacutainers containing citric acid. PBMCs were isolated on lymphocyte separation medium (Organon Teknika, Durham, NC), dispensed into 5 million cell aliquots and stored in liquid nitrogen under conditions to maintain viability.

RNA preparations
Total RNA was extracted from the cryo-preserved PBMC aliquots from the remaining five CFS cases and seventeen controls by the modified guanidinium thiocyanate method [4]. On average, 5 − 10 µg of total RNA was obtained from each 5 million PBMC aliquot. Residual DNA was removed with DNase digestion, 1 U DNase I (GenHunter Corp., Nashville, TN) per 10 µg total RNA for 15 minute at room temperature, followed by phenol-chloroform extraction and ethanol precipitation. We usually recovered 80% of the original total RNA preparation. All RNA samples were examined for RNA integrity and absence of DNA by denaturing agarose gel electrophoresis, and were quantified by UV-spectrophotometry.

Probe synthesis
Digoxigenin labeled double-stranded cDNA probes were synthesized using the SMART PCR cDNA Synthesis Kit and the Advantage cDNA PCR Kit (CLON-TECH, Palo Alto, CA) as previously described [25]. One µg total RNA for each sample was also included in the labeling reaction to monitor for any DNA carry over after extraction (no reverse transcriptase control). One µl of the 100 µl PCR product from each sample and corresponding DNA carry over control were evaluated for dig-11-dUTP incorporation by agarose gel electrophoresis. Fifty µl of the labeled PCR product was used as the probe in the hybridization.

Hybridization and chemiluminescent detection
Samples were hybridized to the Atlas Human cDNA Expression Array with 588 genes (CLONTECH) and the Atlas Human 1.2 Array II with 1176 genes (CLON-TECH) under previously described conditions [25].

Analysis
The chemiluminescent signal was detected with a one-hour exposure to film. Films were scanned on a flatbed scanner and Tagged Image File Formats were generated. Image files were loaded into BioNumerics (Applied Maths, Kortrijk, Belgium) for signal intensity quantification. Due to the variability in background and noise level between films, intensity values for each image were normalized based on the negative and positive controls on each filter. Setting the lowest intensity blank control on each filter to zero, and the highest intensity positive control to one hundred, the BioNumerics program proportionally scaled the intensities between 0 and 100. The program subsequently quantified all signal intensities between 0 and 100. The threshold value for positive signals for each image was determined as the average of the three lowest intensity negative controls plus five standard deviations. Intensity values below this threshold were set to 0.01. The   Three hundred-fifty one genes were negative in 18 (80%) of the 22 subjects and were excluded from further analysis. Subjects were grouped based on disease status (CFS versus healthy) and further stratified by age. The overall agreement for each comparison was evaluated using the Pearson correlation coefficient. The criterion for differential gene expression was greater than four-fold variation in gene intensity. The variation in the expression of each gene was further examined with the nonparametric Wilcoxon test. All data analysis including cluster analysis and multidimensional scaling was performed using BioNumerics (Applied Maths), Microsoft Excel (Microsoft Office XP, Microsoft Corporation), and Statistica (StatSoft, Tulsa, OK).

Results
We used hierarchical clustering and multidimensional scaling to determine if the PBMC expression profiles of the CFS cases were distinguishable from healthy controls. Unsupervised cluster analysis yielded two major branches in the dendogram (Fig. 1). One branch contained four (80%) of five CFS cases and five healthy controls. The other branch contained twelve (71%) of seventeen healthy controls and one CFS case. Multidimensional scaling of the intensity values illustrated that the majority of CFS cases (red spheres) grouped together and were distant and distinct from most healthy controls (green spheres) (Fig. 1). The five controls that clustered with the four CFS cases in the bottom branch were the age-matched controls for three of these cases (shown with identifying numbers 55, 69 and 78). Since all samples were from white females, age may be a significant variable in clustering the profiles. However, not all age-matched controls clustered with their corresponding case [e.g., CFS case 57 (27 years old) did not cluster with her matched control 26 (25 years old)].
The intensity values of each of the genes from the CFS group were compared to the corresponding gene in the control group using the Wilcoxon rank sum test. Nineteen genes were identified as different between these two groups ( Table 1). The mean intensity value for each significant gene for the CFS group and the control group are shown to give a sense of the difference. To control for age, we repeated the analysis using all CFS cases and two age-similar controls for each case. In the age-matched analysis, nine genes were different between the CFS group and the control group (Table  1). Seven genes were common to both analyses (highlighted in grey). Both comparisons had genes that were unique to each group.
Discriminant analysis was also used to compare gene expression of the CFS cases to the healthy control group to detect the most discriminating variables (genes). Four genes were most discriminating in both the total and age-matched analysis: CDK-interacting protein, transcription factor ETR101, CMRF35 and ICAM2 (Fig. 2). The comparison of CFS group to all seventeen controls detected four additional discriminating genes (Fig. 2a) while the comparison of the five CFS cases to the ten age-matched controls detected seven discriminating genes (Fig. 2b). Both discriminant and nonparametric methods detected some of the same genes as differentially expressed between the CFS group and the control group. It is notable that the CMRF35 antigen precursor gene was detected by all methods in both the age-matched and non age-matched comparison.

Discussion
We measured gene expression of PBMCs from CFS cases and controls. We chose to profile the blood because it is a sample that reflects many ongoing systemic pathophysiologic processes. Cluster analysis of the PBMC expression profiles separated the majority of CFS cases from the majority of controls. Similar gene expression profiling studies have been used to identify prognostic biomarkers in a variety of malignancies (e.g., lymphoma, prostate, breast and cervix) [1,5,21]. Each of these studies measured the gene expression of tissues taken directly from lesions and compared profiles to those of normal tissue. Our study demonstrates the utility of the blood for gene expression profiling on an illness without a known lesion. A similar ap-proach may be useful in studying diseases with lesions that are difficult to sample as well as other unexplained illnesses that lack anatomic lesions.
Both cluster analysis and multidimensional scaling grouped and separated CFS from controls. The method of cluster analysis we employed uses a joining or tree clustering algorithm that forms branches based on similarity or distance. In Fig. 1, five controls clustered with the CFS cases. These included the age-matched controls for three of the CFS cases. This indicated that in addition to disease status, age could affect the expression of certain genes and consequently influence how individuals cluster. We controlled for other variables that could impact PBMC gene expression such as sex, race and time of sample collection. It is important to consider and attempt to control for as many variables as possible when using classification algorithms such as clustering. It is also important to validate results using more than one classification algorithm.
mined. This could point to heterogeneity of CFS and the occurrence of CFS subgroups (i.e., disease stratification) [24]. Recognizing that CFS may not be one entity, the current case definition encourages investigators to stratify cases based on mode of onset, as well as other potential distinguishing characteristics [8].
Differentially expressed genes were identified be-tween CFS cases versus all controls or age-matched controls using the Wilcoxon test and discriminant analysis. The non-parametric analysis identified seven genes that distinguished CFS case group from both age-matched and unmatched control groups. Of these seven genes, four (CMRF35, HD protein, phospholipase A2 and ZNF145) were more highly expressed in CFS cases than in controls. It is noteworthy that one gene, the CMRF35 antigen precursor, was detected as differentially expressed by all analytical approaches. This gene encodes a cell membrane antigen that is a member of the immunoglobulin superfamily. The CMRF35 gene is expressed as a receptor or ligand to varying degrees in subsets of immune cells (T, B, natural killer, and myeloid cells) [7]. As an immunoglobulin superfamily antigen, the CMRF35 precursor plays a role in regulating lytic and cytokine expression capabilities of immune cells and is thought to control interactions between T cells and antigen presenting cells or target (virus-infected or mutated) cells that have to be killed [10]. Interestingly, immune function studies from this same group of CFS case and control samples has shown a significant difference in natural killer cell surface markers with CFS cases having less expression of CD2 than controls [18]. None of the genes identified in this analysis have been previously characterized in CFS cases. These will form the basis for developing and testing novel hypothesis of CFS pathogenesis. In addition to the CMRF35 antigen, several of the differentially expressed genes detected here suggest an immunological basis for CFS. Interleukin-8 is a potent proinflammatory chemotactic factor and is down-regulated in the CFS group compared to controls. The CMRF35 antigen was highly expressed in the CFS group and as discussed above is thought to regulate (inhibit) cytokine expression [2]. Integrins are essential for adhesion and function in cell migration, cell proliferation and differentiation. This gene was not as highly expressed in the CFS group compared to the control group. The Huntingtin protein may play a role in apoptosis. It has recently been shown that the mutant Huntingtin protein up-regulates expression of the cell death gene caspase-1 [9]. All of these genes implicate immune dysfunction in the pathophysiology of CFS.
Simultaneous evaluation of thousands of expressed genes in the peripheral blood represents a powerful approach for characterizing illnesses without a known or accessible lesion. This study is a successful "proof-ofconcept" for this approach. However, the small numbers of samples that were available for analysis and the fact that we only assessed the expression of 1,764 genes must temper the strength of our conclusions about the significance of any of the differentially expressed gene detected here and its role in the pathophysiology of CFS. In addition, the differential expression of the genes identified could not be independently validated in these samples because there was no residual RNA avail-able [19]. Further studies using peripheral blood samples from larger numbers of well-characterized subjects are necessary to minimize individual differences and experimental variability. In addition, glass microarrays provide the possibility of studying a broader spectrum of genes. These studies are in progress and will potentially yield a clearer profile of aberrant pathophysiologic pathways and identify diagnostic biomarkers of CFS.