An analysis of abnormalities in the B cell receptor repertoire in patients with systemic sclerosis using high-throughput sequencing

Systemic sclerosis is a chronic multisystem autoimmune disease that is associated with polyclonal B cell hyperreactivity. The CDR3 of BCRs is the major site of antigen recognition. Therefore, we analyzed the BCR repertoire of patients with SSc. The BCR repertoires in 12 subjects including eight SSc patients and four healthy controls were characterized by high-throughput sequencing, and bioinformatics analysis were studied. The average CDR3 length in the SSc group was significantly shorter. The SSc patient displayed more diverse BCR. Moreover, SSc patients with mild skin sclerosis, anti-Scl70, interstitial lung disease or female sex were more diversified. B cells from the SSc patients showed a differential V and J gene usage. SSc patients had distinct BCR repertoires.These findings reflected the differences of BCR repertoires between SSc patients and controls. The higher-usage genes for the BCR sequence might be potential biomarkers of B cell-targeted therapies or diagnosis for SSc.


81
In the present study, we investigate the BCR repertoire in patients with SSc. The CDRs of BCRs were detected using high-82 throughput sequencing (HTS) to explore the differences in the BCR repertoire between the SSc and healthy control groups. The 83 large-scale sequencing of this BCR repertoire may improve our understanding of the immune system in patients with SSc. In 84 addition, a deeper understanding of the BCR repertoire in patients SSc will help us understand the mechanisms of B cell 85 hyperactivity and tolerance imbalance in patients with SSc, which we hope will provide a theoretical basis for the development of 86 more effective targeted therapies in the future.  (1980). Four age-matched 96 healthy subjects comprised the control group. We declare that this study has been approved by the Ethics Committee of the First 97 Hospital of Jilin University (2015-119), and written informed consent was obtained from all participants. 98 The clinical manifestations and laboratory results were derived from clinical questionnaires. Lung fibrosis was observed 99 using high-resolution computed tomography (HRCT). The modified Rodnan total skin thickness score was used to assess the 100 degree of skin fibrosis (LeRoy et al. 1988). The 8 patients scored 12-30 points before treatment and were divided into two groups 101 (mild and severe) according to a cutoff for the modified Rodnan score of 18 points. Table 1 shows the clinical and laboratory 102 characteristics of each patient.
103 Samples collection 104 Ten milliliter blood samples were collected, and peripheral blood mononuclear cells (PBMCs) from patients and controls were 105 obtained immediately by density-gradient centrifugation over Ficoll (MD Pacific Biotechnology Co., Ltd, Tianjin, China). Total 106 RNA was extracted from PBMCs by TRIZOL reagent (Invitrogen, Carlsbad, CA,USA)according to the manufacturer's 107 instructions. Nanodrop 2000 was used to evaluate the quantity and purity of RNA, which was reversely transcribed into cDNA.
108 Multiplex PCR amplification 109 The human immunoglobulin heavy chain (IGH) sequences were downloaded from the international ImMunoGeneTics 110 information system (IMGT) (http://www.imgt.org/). A relatively conserved region 3 was selected as the predicted forward primer 111 region in the upstream sequence of CDR3. A set of primers corresponding to most V gene families were selected. Similarly, 6 112 reverse primers consistent with J gene families were designed. MFEprimer2.0 (Shanghai, China) and Oligo 7.0 (Colorado 113 Springs, CO, USA) were used to analyze the primers and reverse primers for dimers and loop structures. We cooperated with the 114 BGI-Shenzhen Company to regulate primer concentrations and minimize PCR amplification bias using a new bioinformatics 115 method (IMonitor) (Clements et al. 1995;Liu et al. 2016). The sequences of low-quality primers were slightly altered. The final 116 primer sequences are shown in supplementary Table S1. 117 The complete VDJ rearrangements of the IGH sequences were amplified from 4ug of DNA using the primers listed in the 119 DNA using a high-fidelity enzyme. The PCR conditions were 15 min at 95°C, 25 cycles at 94°C for 15 s and 60°C for 3 min, and 120 an incubation at 72°C for 10 min. PCR products were purified and primer sequences were removed using AMPure XP beads 121 (Brea, CA, USA). Then, the second round of PCR began, and we added a sequencing index to each sample. The PCR conditions 122 were 98°C for 1 min, 25 cycles at 98°C for 20 s, 65°C for 30 sand 72°C for 30 s, and a final incubation at 72°C for 5 min. Finally, 123 an agarose gel was used to isolate the target library, and QIAquick gel Extraction Kits (Qiagen) were used to isolate and purify 124 the target region. 125 High-throughput sequencing 126 The Illumina HiSeq sequence adapter was connected and libraries were sequenced using the Illumina HiSeq 4000 platform.
127 The library was quantified using an Agilent 2100 bioanalyzer (Agilent DNA 1000 reagent, Santa Clara, California, USA) and 128 real-time quantitative PCR (TaqMan probes, Shenzhen, China). 129 Data analysis 130 The quality of the Illumina HiSeq 4000 library was evaluated using a BGI formula. Namely, we filtered adapter reads and low-131 quality reads in the raw data, and further aligned clean data. The clean data were then compared with the human IGH database 132 and analyzed with MiXCR, which categorized the identical and homologous reads into clonotypes and corrected the PCR and 133 sequencing errors using heuristic multi-layer clustering. The obtained data included V, D, J assignments, clustering, CDR3 length 134 distributions and other results.

143
After filtering, including the removal of adapter sequences, contamination, and low-quality reads, we acquired an average of 144 8,363,668 sequencing reads each sample. Table 2 showed the IGH sequence summary. (Table 2) 145 Distribution of CDR3 lengths

146
The important determinant of the diversity of the B cell repertoire is the length of the BCR CDR3 loop. In our study, we 147 evaluated the length distribution of the BCR CDR3 sequence (aa) in the SSc and Control groups. The SSc group had a higher 148 percentage of BCR CDR3 sequences of 14 amino acids (aa) in length than the Control group (P=0.029), but a lower percentage 149 of BCR CDR3 sequences of 29 (P=0.039) or 37 (P=0.013) aa in length ( Figure S1). The average CDR3 length was significantly 150 shorter in the SSc group (17.85±0.37 aa) than in the Control group (18.47±0.038 aa; P=0.038) ( Figure S2). 151 152 Degree of expansion and frequency distribution of B cell clones 153 By aligning and identifying each sequence, we were able to calculate the expression level of each clone. The extent to which 154 each individual clone expands depends on the frequency of the unique CDR3 sequence in a sample. Here, we defined clones with 155 a frequency greater than 0.5% among the analyzed BCRs as highly expanded clones (HECs), clones with a frequency of 0.05-156 0.5% as high-frequency clones, clones with a frequency of 0.005-0.05% as medium-frequency clones, clones with a frequency of 157 0.0005-0.005% as low-frequency clones, and clones with a frequency of <0.0005% as rare clones. In terms of the frequency 158 distribution, most of the repertoires were composed of a small amount of HECs distributed in a left-skewed manner, and most 159 clones were present at a low frequency ( Figure S3). In the SSc group, the more highly expanded clones (9.37×10 -2 ±3.51×10 -2 %) 160 accounted for 10% of the B cell sequences present among the medium-frequency clones (0.005−0.05%) (P=0.005), and the 161 expanded clones (1.31×10 -2 ±1.42×10 -2 %) accounted for 2.1% of the B cell sequences present among the high-frequency clones 162 (0.05−0.5%) (P=0.013). However, lower expression levels were observed among the rare clones (0.874±3.32×10 -2 %) (P=0.019) 163 in the SSc group than in the Control group (TableS2). 164 165 Comparison of the BCR repertoire diversity between groups 166 We used the Shannon entropy index, which summarizes the frequency of every clonotype, to quantify the BCR repertoire 167 diversity of the SSc group, Control group and subgroups with different features. An analysis of the repertoires of patients with 168 SSc revealed a very high BCR diversity that was higher than the Control group (P=0.004). Moreover, the entire BCR repertoire 169 of the SSc group had a much more diversified clonotype composition than the Control group, particularly in the subgroups with a 170 mild degree of skin sclerosis, anti-Scl70 antibodies, or interstitial lung disease (ILD). Furthermore, the Shannon entropy of the 171 Control group and female and male patients in the SSc group was analyzed; the BCR diversity of female patients was higher than 172 the Control group. Although the P value was not statistically significant, a greater diversity was observed in male patients than in 173 female patients. The sample size of the male group should be increased in subsequent studies ( Figure 1).

175 Distribution of similar CDR3 sequences in the SSc and Control groups 176
The similarity map of the samples was obtained by calculating the distance between each pair of samples as the Jaccard 177 distance=1−Jaccard index and constructing the neighbor-joining tree ( Figure 2). According to the map, the subjects were divided 178 into three groups: female patients with SSc (P1−6), male patients with SSc (P7−P8) and healthy controls (H1−4). An obvious 179 difference was detected between the patients and healthy controls, which were well-separated; however, among the patients, the 180 similarities between P7 and P8 were also different from the other patients. (The sequences in male patients with SSc were also 181 different from those in female patients with SSc.) 182 183 Comparison of IGHV and IGHJ repertoires between the SSc and Control groups 184 The usage features of the V and J gene segments were analyzed for the different clone levels, as shown by the histograms and 185 heat maps, respectively (Figures 3 and 4), to analyze whether disease-specific differences in the IGHV and IGHJ repertoires 186 existed. The expression levels of the respective IGHV and IGHJ repertoires were compared among the groups. For the IGHV 187 gene segments, IGHV3-9 (P=0.04) were highly expressed in the SSc group. For the IGHJ gene segments, IGHJ4 (P=0.02) 188 showed significantly higher usage in the SSc group. Then, the relative usage frequencies of V-J pairs were compared between 189 the SSc and Control groups ( Figures 3C and 4, table 3). IGHV1-18-J3,GHV1-8-J2,GHV1-8-J4, IGHV3-53-J4,IGHV3-9-190 J3,IGHV3-9-J4,IGHV3-9-J5, IGHV4-61-J4, SSc group is higher than Control (Figures 3C and 4) BCR repertoires in patients with SSc showed significant changes in IGHV gene usage compared to healthy controls. Our 246 results showed an enrichment of IGHV3-9 and IGHJ4 gene family usage, specificallyIGHV1-18-J3, GHV1-8-J2, GHV1-8-J4, 247 IGHV3-53-J4, IGHV3-9-J3, IGHV3-9-J4, IGHV3-9-J5 and IGHV4-61-J4 usage in patients with SSc was higher than Control, 248 but in IGHV2-5-J6, IGHV2-70-J5, IGHV3-20-J6, IGHV3-23-J6, IGHV3-7-J5, IGHV4-4-J6, the usage was lower than Control. 249 The usage of these pairs was significantly different between the SSc and Control groups (Figures 3, Figure 4 and  . Furthermore, our study has shown that the usage of IGHV2-5-J6 in SSc was significantly lower than control, which is 253 consistent with de Bourcy's study, IGHV2-5 was used at a lower level in SSc-PAH participants (de Bourcy et al. 2017). It has 254 been proved that B cells changes of systemic sclerosis is related to the balance in naive and memory B cell, and that has been 255 shown, an imbalance in naive/switched and unswitched memory B cells that may explain the B cell repertoire abnormality 256 observed in our study (Simon et al. 2016). In the study, the higher-usage genes also provide additional information for the future 257 study of effective B cell-targeted therapy or prognostic and/or diagnostic biomarkers for SSc. 258 In conclusion, the analysis of the BCR immune repertoires of patients with SSc using this meaningful method has an 259 important application value for studying the prognosis and evaluating the clinical responses to treatment in the future. Although 260 this study has some limitations due to the small sample size, future investigations aimed at improving our understanding of the 261 role of the BCR repertoire in immune responses, autoimmunity and autoreactivity are anticipated as the cost of HTS decreases: 262 first to analyze the repertoires of different B cell subsets in patients with SSc, and second to further identify the features of the 263 BCR repertoire and the functions of specific BCR genes in different patients with SSc.           Manuscript to be reviewed