Integration profile of retroviral vector in gene therapy treated patients is cell-specific according to gene expression and chromatin conformation of target cell

The analysis of genomic distribution of retroviral vectors is a powerful tool to monitor ‘vector-on-host’ effects in gene therapy (GT) trials but also provides crucial information about ‘host-on-vector’ influences based on the target cell genetic and epigenetic state. We had the unique occasion to compare the insertional profile of the same therapeutic moloney murine leukemia virus (MLV) vector in the context of the adenosine deaminase-severe combined immunodeficiency (ADA-SCID) genetic background in two GT trials based on infusions of transduced mature lymphocytes (peripheral blood lymphocytes, PBL) or a single infusion of haematopoietic stem/progenitor cells (HSC). We found that vector insertions are cell-specific according to the differential expression profile of target cells, favouring, in PBL-GT, genes involved in immune system and T-cell functions/pathways as well as T-cell DNase hypersensitive sites, differently from HSC-GT. Chromatin conformations and histone modifications influenced integration preferences but we discovered that only H3K27me3 was cell-specifically disfavoured, thus representing a key epigenetic determinant of cell-type dependent insertion distribution. Our study shows that MLV vector insertional profile is cell-specific according to the genetic/chromatin state of the target cell both in vitro and in vivo in patients several years after GT.


INTRODUCTION
Retroviral vectors have been extensively used in gene therapy (GT) applications as an effective tool to permanently integrate a therapeutic gene of interest into a target cell, conferring in most cases stable and efficient transgene expression over time (Kohn & Candotti, 2009;Naldini, 2009). Both wild-type retroviruses and retroviral vectors were initially thought to insert randomly into the host genome, but it is now well established that the majority of them display a semi-random or a non-random integration profile possibly influencing the fate of transduced cells (Brady et al, 2009;Bushman, 2007;Cattoglio et al, 2007;Ciuffi, 2008;Mitchell et al, 2004;Wu et al, 2003). Initial reports of leukaemia cases from the X-linked severe combined immunodeficiency (SCID-X1) GT clinical trial suggested a potential link between leukaemogenesis and insertional activation of the LMO2 proto-oncogene (Hacein-Bey-Abina et al, 2003a,b). Indeed, it is now recognized that vector bearing enhancer sequences can alter the expression of neighbouring genes  and several studies associated events of clonal dominance in vivo to vector insertion sites (Kustikova et al, 2007;Montini et al, 2006Montini et al, , 2009. On the other hand, in vitro analyses of transduced clones purified from patients several years after GT failed to show clear signs of perturbation of neighbouring genes Recchia et al, 2006). Additionally, although the analyses of integration sites from adenosine deaminase deficient SCID (ADA-SCID), SCID-X1, and chronic granulomatous disease (CGD) trials (Aiuti et al, 2007;Deichmann et al, 2007;Ott et al, 2006) show the presence of specific regions with recurrent integrations (common integration sites, CIS), it remains undefined to what extent the presence of CIS is the result of positive clonal selection after cell infusion or instead derives from preferential targeting for integration at the time of transduction (Cattoglio et al, 2007). Indeed, insertion site selection during in vitro transduction could be driven by tethering of transcription factors (TF) to specific regions according to TF binding sites location (Felice et al, 2009) and seems dependent on cellular determinants as well as on vector design (Lewinski et al, 2006).
Along this line, we studied the impact of vector integrations on clonal expansion and the frequency of CIS after GT in five patients from a clinical trial of haematopoietic stem cell gene therapy (HSC-GT) for ADA-SCID that has been shown to achieve immune and metabolic correction in the absence of adverse events related to gene transfer (Aiuti et al, 2002a(Aiuti et al, , 2009. Our data did not reveal any sign of clonal dominance or aberrant expansions even in the presence of CIS in the LMO2 or CCND2 proto-oncogene loci (Aiuti et al, 2007). It is now believed that other factors including the disease background, the nature of the transgene, and the acquisition of other genetic abnormalities unrelated to vector insertions are also needed for aberrant expansion of transduced clones (Hacein-Bey-Abina et al, 2008;Howe et al, 2008). While most of these studies have been focusing on vector-on-host effects, there is still limited information on the role of the target cell status at the time of transduction and host-on-vector influences upon in vivo engraftment. Toward this aim, two recent publications addressed the possible influence of target cell type on integration profile of retroviral vectors in murine LSK subpopulations (Kustikova et al, 2009) and T-cells (Newrzela et al, 2008), the latter suggesting that the cell-dependent insertional pattern of transduced, mature murine lymphocytes play only a secondary role in inducing resistance to transformation. However, these findings remain to be defined in GT clinical trials on patient samples and a characterization of genomic features influencing integration preferences in different human target cells is currently missing. New information now provide a detailed genome-wide map of the epigenetic and chromatin state of different human cell types, like HSC and peripheral blood T-cells, allowing to compare retroviral vector distribution with several high-throughput mapped genomic features such as DNase I hypersensitive sites (HSS; Boyle et al, 2008;Xi et al, 2007) and histone methylation distribution (Barski et al, 2007;Cui et al, 2009). In this regard, recent reports have shown an interesting correlation between retroviral insertions and histone methylations in human cells (Brady et al, 2009;Wang et al, 2007Wang et al, , 2010. In order to study the influences of host cell condition on vector insertion sites in vitro and after clonal selection in patients, we identified integration sites from patients affected by ADA-SCID and treated with infusions of moloney murine leukemia virus (MLV) transduced HSC (HSC-GT) or peripheral blood lymphocytes (PBL-GT; Aiuti et al, 2002b). This represents a unique model that allows the comparison of insertions derived from the same ADA-encoding vector in the context of the same genetic background but in two different target cells, before and several years after infusion in patients. We found that the MLV-derived vector displayed a different integration distribution in the two groups of patients under PBL-GT or HSC-GT. These cell-specific insertion preferences were directly related to the epigenetic state and expression profile of the cell type at the time of transduction and overall did not show any particular vector-driven bias in patients even long-term after GT.

Distribution of vector integrations from PBL-GT and HSC-GT
To compare the insertional profile of MLV vector in PBL-GT and HSC-GT, we identified a total of 4157 unique insertions from ADA-SCID patients (2198 from PBL-GT and 1959 from HSC-GT) by the combination of shotgun cloning-generated insertion libraries of ligation mediated PCR (LM-PCR) products and pyrosequencing of linear amplification mediated PCR (LAM-PCR) products (Table 1). The integrations were divided in two categories according to the samples from which they were collected for each group of patients, in vitro (PRE-GT) and in vivo (POST-GT). From HSC-GT patients, we retrieved retroviral integration sites (RIS) from both CD3þ T-cells and CD15þ granulocytes in vivo, whereas only T-cells were analysed in the PBL-GT group since in these patients the vector is undetectable in other lineages. The overall distribution of the vector integrants confirmed the classical preferences of MLV vector to integrate inside or in the proximity of coding regions

Research Article
Retroviral vector integrations are cell-specific Table 1 ( Table 2). In PBL-GT, 54.1% of RIS are inside genes and 26.1% in a window of 10 kb surrounding the transcription start site (TSS) of genes, while in HSC-GT their frequency was 46.1 and 21.8%, respectively. Comparing the data obtained in vitro before infusion and in vivo, we found that the percentage of insertions inside genes was reduced after in vivo engraftment. PRE-GT in vitro samples showed an enriched integration frequency inside genes both in PBL-GT and HSC-GT (55.9 and 52.6%, respectively) while POST-GT in vivo purified cell populations showed a lower frequency of RIS into genes as compared to in vitro data in PBL-GT T cells (51.3%) and in CD15þ cells (40.6%) or CD3þ cells (40.7%, p < 0.001. Test on proportions with Holm-Bonferroni correction) from HSC-GT. In vivo T-cells and granulocytes from HSC-GT displayed a slightly higher percentage of insertions surrounding the TSS compared to the in vitro dataset, while there is no indication of in vivo skewing in the PBL-GT samples.
Specific CIS were identified from PBL-GT and HSC-GT We then analysed the presence of hotspots of integrations and CIS in all our datasets in vitro and in vivo normalizing the size of the clustering window for the number of integrations retrieved in each group. CIS, defined as clusters of two or more insertions, were present already in vitro at the time of transduction both in PBL-GT (43.4%) and HSC-GT (36.1%) groups ( Table 2). The proportion of insertions involved in clusters increased upon in vivo selection only in CD3þ cells differentiated from transduced HSC (44.8%, p < 0.05. Test on proportions with Holm-Bonferroni correction) while it decreased in those derived from gene-corrected mature PBLs (34.0%, p < 0.001). Integrations from granulocytes did not display any particular clustering of insertions around specific regions to the level present in vitro in CD34þ cells (15.8%, p < 0.05).
By comparing the two clinical trials, only a small number of CIS genes (12 out of 474 total CIS) were in common between in vitro transduced PBL or HSC while the majority of hotspots were specific for the two target cells (Fig S1 of Supporting Information). This low similarity is only slightly increased in vivo where 14 out of 402 total hotspots were shared between T-cell from HSC-GT and T cells from PBL-GT patients. Among CIS genes retrieved in vivo only from HSC-GT, we found LMO2 and MDS1-EVI1. These genes were targeted both in CD3þ and CD15þ cells from HSC-GT treated patients, while in vitro we found only one insertion near LMO2 from transduced CD34þ cells. The TCRA locus was instead a clear CIS only in PBL-GT while no integration was retrieved in this region from HSC-GT samples.

T-cell specific genes are favoured insertion sites in PBL-GT
To analyse the functions of genomic loci involved by RIS, we collected data regarding the gene directly hit or the single gene closest to each insertion site in order to assign one gene to one integration. Hit gene lists for all insertion sites from PBL-GT, HSC-GT and a random in silico dataset (n ¼ 100,000) were uploaded on the ingenuity pathways analysis (IPA) to look for related biological functions.
We calculated the contribution of hit genes with respect to the functional categories listed in the IPA database. We analysed the frequency of hit genes listed in the two main categories 'Haematological functions' and 'Immune functions' with respect to the remaining functions (Others; Fig 1A). As compared to random reference both HSC-GT and PBL-GT displayed a higher frequency of hit genes belonging to haematological functions. PBL-GT displayed an even higher contribution of hit genes from this category together with a significant over-representation of immune functions genes as compared to HSC-GT (37% vs. 28%, p < 0.001. Test on proportions with Holm-Bonferroni correction) and random (27%, p < 0.001). The PBL-GT dataset showed also a lower frequency of genes involved in other functions, particularly in cardiovascular and nervous system. These general preferences were confirmed by restricting the analysis to CIS genes in vitro and in vivo from PBL-GT and HSC-GT (Fig S2 of Supporting Information).
Considering the nature of the cell type transduced in PBL-GT, we extrapolated T-cell specific functions from the IPA library and cross-compared these categories with our list of hit genes. We found that genes involved in proliferation ( p < 0.001. Test on proportions with Holm-Bonferroni correction), activation ( p < 0.001), differentiation and development ( p < 0.05) of T lymphocytes were clearly a preferential target for integrations in PBL-GT when compared to the HSC-GT insertions and random dataset ( Fig 1B). We then looked closer at the canonical

Research Article
Luca Biasco et al. Percentages of integrations in vitro and in vivo from PBL-GT (blue) and HSC-GT (red) landing inside genes, outside genes, in a 10 kb window centred on TSS of genes and involved in CIS in vitro and in vivo in PBL-GT or HSC-GT. Percentages relative to 100.000 random insertions and human genome reference are also shown for comparison (ND, not done; NA, not available).
www.embomolmed.org EMBO Mol Med 3, 89-101 ß 2011 EMBO Molecular Medicine T-cell receptor (TCR) signalling pathway from the IPA library. As shown in Fig S3 of Supporting Information, the TCR pathway was strongly and preferentially hit by integrations derived from mature lymphocytes as compared to the ones from HSC and their differentiated progeny. We also analysed the contribution of hit genes to cytokine and interleukin signalling pathways (Fig 2). T-cell specific pathways like signalling of TCR, CD28 and IL-2 were significantly targeted only in PBL-GT as compared to the expected random frequency calculated by IPA ( p < 0.05, Benjamini-Hochberg correction for multiple testing). On the contrary, IL-6 signalling genes were significantly over-represented in the HSC-GT group of patients and not in the PBL-GT one ( p < 0.05). We also divided the list of hit genes according to their different representation in the two GT trials ( Fig S4 of Supporting Information). Again, the preference for T-cell specific pathways was found only in the PBL-GT specific hit genes and PBL-GT/HSC-GT commonly hit genes.

No in vivo enrichment for 'Cancer' genes in PBL-GT and HSC-GT
We then studied the contribution of hit genes to the disease category 'Cancer' (from IPA software) for all the integration subsets ( Fig S5 and Supporting Information 'List of Cancer Genes hit in PBL-GT and HSC-GT'). Proto-oncogenes were overrepresented in both trials when compared to randomly generated integrations (n ¼ 12,323 genes, p < 0.05. Test on proportions, Holm-Bonferroni correction). However, by comparing in vitro and in vivo data, we did not find any significant enrichment in terms of hit proto-oncogenes among the different subsets in both groups of patients. The in depth analysis of genes related to myeloid leukaemia, lymphocytic leukaemia and lymphomas in vitro in both the trials showed no significant in vivo skewing for any of these categories ( Fig S5 and Supporting Information 'List of Leukaemia-Lymphoma Genes hit in PBL-GT and HSC-GT'). Although CD15þ cells from HSC-GT seem to display a slightly higher contribution of myeloid leukaemia and lymphocytic leukaemia genes, these differences were not

Research Article
Retroviral vector integrations are cell-specific significant by the comparison with in vitro CD34þ cells integration dataset (Test on proportions, Holm-Bonferroni correction).

Differential gene expression profile influences vector integrations in PBL and HSC
To get more insight into the differential distribution of the vector in the two transduced cell types, we analysed the expression profile of target cells at the time of transduction through HG-U133A Affymetrix microarray chips. We then restricted the analysis to the list of genes found to be hit in PBL-GT or HSC-GT and compared the expression levels of hit genes to the overall expression profile in T cells and CD34þ cells (Fig 3A). We found that the expression of hit genes was skewed towards the highest expression categories in both trials as compared to the random reference ( p < 0.0001 for all subgroups except for HSC CD15þ with p ¼ 0.052, Mann-Whitney test). In contrast, no significant skewing was found between the expression profile of hit genes retrieved from in vivo samples as compared to in vitro, in both PBL-GT and HSC-GT groups.
By the comparison between the two Affymetrix chip datasets, we were also able to extrapolate differentially expressed genes in T cells and CD34þ cells at the time of transduction from the distribution of differences (deltas) based on expression values ( Fig 3B). We identified a small fraction of probesets (225 out of 22,284; 1.01%) that were differentially expressed above an arbitrary level of AE3 delta robust multi-array averaging analysis (RMA) values. A total of 105 probesets (corresponding to 89 genes) were found highly expressed in T cells at the time of transduction, while 120 probesets (relative to 113 genes) were specifically highly expressed in HSCs (Fig 3 heatmap and Supporting Information 'Differentially expressed genes in T cells vs. HSC').
We then tried to assess how many integrations hit these two groups of genes in PBL-GT or in HSC-GT (Fig 3B column graph), considering that the calculated random frequency of insertions is 0.5% in each group. Integrations from PBL-GT involved 19 out of 89 T-cell specifically expressed genes (35 RIS, 1.6% of PBL-GT insertions) and only 3 out of 113 CD34þ cells specific genes (4 RIS, 0.2% of PBL-GT insertions). On the contrary, the HSC-GT integration dataset contained 11 out of 113 of HSC specifically expressed genes (23 RIS, 1.2% of HSC-GT insertions), whereas T-cell expressed genes where significantly less represented (3 out of 89; 5 RIS, 0.3% of HSC-GT insertions; p < 0.005. Test on proportions, Holm-Bonferroni correction).
We then analysed the distribution of hit genes in vitro and in vivo in both RIS datasets on the basis of their differential expression levels between the two chips ( Fig 3C). We found that the insertions in HSC-GT versus PBL-GT were distributed among the profile of expression of hit genes in a significantly different fashion ( p-value <0.001/<0.0001 Mann-Whitney test). Moreover, PBL-GT RIS were more skewed towards the T-cell specifically expressed genes already in vitro at the time of transduction, with no significant difference as compared to the in vivo insertions. HSC-GT insertions were instead more related to CD34þ expressed genes both in vitro and in vivo, with CD3þ cells displaying a distribution more similar to the random reference as compared to in vitro dataset. These observations were confirmed and strengthened by limiting the analysis to CIS genes in all the subgroups in vitro and in vivo (Fig S6 of Supporting Information).

Cell-specific epigenetic marks correlate with integration preferences
We then assessed whether the differential vector bias for some loci was related to the overall chromatin accessibility of these regions in the two different target cell types. In a first analysis, we compared the distribution of our integrations to DNaseI HSSs mapped in human CD4þ T cells (Boyle et al, 2008). Overall, in both trials the MLV vector displayed a strong preference to integrate directly inside these regions as compared to the random reference (2% randomly simulated RIS n ¼ 100,000; Fig 4A). Nonetheless, these features were hit in a significantly higher proportion in PBL-GT (34% of RIS from PBL-GT vs. 28% from HSC-GT; p < 0.001. Test on proportions with Holm-Bonferroni correction) and integrations from PBL-GT were on average more than two times closer to HSS mapped on T cells as
To further dissect the relationship between integration preferences and chromatin status, we took advantage of the data available on histone methylation profile in CD4þ T cells and CD133þ/CD34þ haematopoietic stem/progenitor cells (HSC/HPC; Barski et al, 2007;Cui et al, 2009). We analysed the distances of each of these features from every integration site considering a AE45 kb window centred on RISs both from PBL-GT and HSC-GT in vitro and in vivo. This analysis was performed on histone methylations mapped both on T cells and HSC/HPC (Fig 5A). Remarkably, PBL-GT and HSC-GT datasets showed a higher contribution of integrations close to histone modifications associated with open chromatin state and gene activation, like H3K4me1 and H3K27me1 and with active TSS like H3K4me3. These preferences were present at the time of transduction and were maintained in POST-GT samples. On the other hand, histone methylations associated with heterochromatin regions, like H3K9me3, were disfavoured by the MLV vector equally in both cell types. Strikingly, the disfavouring for H3K27me3 was cell-specific, since it was evident only when the RIS of one trial were compared to this feature mapped on their relative target cell. On the contrary, when compared to the H3K27me3 mapped on the unrelated target cell, the distribution of RIS related to this histone modification was more similar to the random reference. A complete analysis on other epigenetic features is shown in Figs S7 and S8 of Supporting Information. We also subdivided the insertional datasets in vitro and in vivo in two subgroups containing CIS insertions and NOT CIS insertions, respectively, and we extended the analysis of histone methylation density distribution on each of these loci (Fig S9 of Supporting Information). Integration sites from HSC-GT in vivo CD15þ cells were excluded in this analysis due to the small number of CIS retrieved. The distribution of histone methylations around CIS was overall similar to the regions hosting single integration events (NOT CIS). We found additional confirmation of cell-specific disfavouring for H3K27me3 modification, although CIS from HSC-GT CD3þ cells in vivo showed a different probability density distribution of this feature, more similar to the random reference, as compared to NOT CIS integrations and insertions from CD34þ cells in vitro.
Finally, we selected two cell-specific hotspots, LMO2 and TCRA, to study the local epigenetic landscape in relation to the specific insertion sites (Fig 5B). The LMO2 region, that was found to be hit only in HSC-GT, displayed a clear heterochromatic configuration (high H3K27me3 density) in T cells being instead open in HSC/HPC. On the other hand, the TCRA locus, which was an hotspot in PBL-GT but did not host any HSC-GT insertions, showed in correspondence to insertion site cluster a strong signal of open chromatin conformation (high H3K4me3 density) in T cells while it is mostly in closed configuration in HSC/HPC.

DISCUSSION
Integration site analysis has now become a critical tool to monitor the activity of retroviral vectors on in vivo selection of patient clones (Aiuti et al, 2007;Deichmann et al, 2007;Ott et al, 2006). In view of the adverse events that occurred in the SCID-X1 GT clinical trials, a number of studies have focused their attention on the influences of the vector on neighbouring genes (Hacein-Bey-Abina et al, 2003b, 2008Howe et al, 2008;Maruggi et al, 2009;Ott et al, 2006) as well as how these could lead to clonal dominance and possibly to malignant transformation (Deichmann et al, 2007;Kustikova et al, 2007;Montini et al, 2006Montini et al, , 2009. In contrast, the impact of host cell genetic/ epigenetic conditions on vector target site selection in vitro and in vivo is often overlooked in GT clinical studies.
Here, we had the unique opportunity to compare integration sites in two GT trials, in the absence of adverse events and major clonal expansions, sharing the same MLV vector gene transfer approach and the same ADA-SCID genetic background and only differing in the transduced cell type infused in patient,

Research Article
Luca Biasco et al. mature T lymphocytes or haematopoietic progenitors in PBL-GT and HSC-GT, respectively. Our high-throughput analyses revealed a cell-specific vector preference that is related to the host cell status in terms of chromatin state and transcriptional activity at the time of transduction. Long-term follow up of treated patients showed that this cell-specific pattern is overall maintained in vivo.
The number of unique insertions retrieved from the different samples analysed correlate well with the percentage of vector positive cells calculated by q-PCR. As expected, the contribution of insertions from granulocytes in HSC-GT patients was lower with respect to other lineages, having proportions of transduced cells ranging between 1.1 and 5.8%. Nevertheless, the integration dataset from granulocyte cells is more complex and abundant than in other studies on GT clinical trials (Ott et al, 2006;Wang et al, 2010). In addition, in the present work, we collected a relevant number of insertions from PRE-GT samples (1365 RIS and 887 RIS from PBL-GT and HSC-GT, respectively) allowing a coherent comparison of in vitro data with integration sites detected long-term in patients. We were thus able to better discriminate between intrinsic insertional preferences in target cells before infusion and selective pressures or vector-driven bias after in vivo engraftment and selection.
Integrations inside genes were less represented in POST-GT samples, suggesting that clones carrying insertions into coding regions affecting cell survival were negatively selected in vivo.
In addition, when focusing on CIS we found that clusters of insertions were present at a relevant frequency already at the time of transduction both in PBL-GT and HSC-GT, but the contribution of integrations involved in CIS increased in CD3þ cells differentiated from transduced progenitors of HSC-GT patients. This in vivo skewing could be explained by a positive role of vector position on growth-promoting genes, such as LMO2 and MDS1-EVI1. Nonetheless, the polyclonality of the RIS and the TCR repertoire as well as the lack of expansion of LMO2 insertions by clonal tracking over time (Aiuti et al, 2007) would suggest that the finding of a CIS in the proximity of a protooncogene is not necessarily linked to aberrant proliferation in vivo. Although we found only one RIS in the LMO2 locus in vitro in HSC-GT in the present dataset, it should be noted that this region has been previously shown to be a hotspot of integrations in CD34þ cells (Aiuti et al, 2007;Cattoglio et al, 2007). One could also argue that clones carrying insertions in HSC active loci would have a survival advantage due to a better detoxification from toxic metabolites, particularly in the early phases after GT during which purine metabolites are elevated due to enzyme replacement therapy discontinuation in combination with the effects of chemotherapy (Aiuti et al, 2009). Monitoring these clones in longer follow up will provide clearer information about a possible effect of vector on clonal dynamics.
The highest overlap of hotspots (6.6% of CIS genes) was found comparing the integrations from in vitro and in vivo lymphocytes in PBL-GT. On the other hand, few CIS genes were in common between lymphocytes and their transduced progenitors in HSC-GT, likely as a result of the complex developmental path from CD34þ cells to T cells.
Having observed differential genomic distribution and clustering of insertions in the two GT trials, we hypothesized that the different status of the two target cell types (lymphocytes and HSC/HPC) could have influenced in a cell-specific fashion the integration profile of MLV vector in vitro at the time of transduction and in vivo after selection. Our finding that MLV insertions from PBL-GT showed a strong preference for 'immune functions' genes and a lower contribution of categories unrelated to haematological/immune system was a first confirmation of a cell-dependent integration profile. This observation was further strengthened when we looked deeper at T-cell specific functions and pathways, which were significantly hit by integrations only in PBL-GT and not in HSC-GT. Analysis of the 'Cancer' category and leukaemia/ lymphoma related genes from IPA software showed a higher contribution of hit genes with respect to random, with no significant in vivo selection for these categories, suggesting that integrations in these loci have not provided any growth advantage to transduced cells. It should also be noted that the definition of 'Cancer-related' genes is somehow arbitrary and a relevant number of genes belonging to this category are also involved in physiological functions of haemopoietic system.
We then wondered whether cell-specific preferences were linked to the expression profile of the two different target cells at the time of transduction. In line with previous reports on integration studies in haematopoietic cells (Aiuti et al, 2007;Cattoglio et al, 2007;Mitchell et al, 2004;Recchia et al, 2006), MLV vector favoured genes belonging to higher expression categories in both trials in vitro, without significant in vivo skewing. By the comparison of the expression profile in the two target cell types we found that MLV vector favoured genes specifically expressed in T cells in PBL-GT and genes specifically expressed in CD34þ cells in HSC-GT. Interestingly, the comparison between integrations in CD3þ cells in vivo from HSC-GT versus PBL-GT revealed that the same kind of lymphoid cells displayed in the two trials a completely different integration profile that was related to the type of transduced cells from which they derived in vivo.
However, gene expression is not the only mechanism driving integration preferences of MLV vector. Indeed, we found a substantial number of genes hit by the vector but not significantly expressed in the two target cells and a relevant overlap in the expression pattern of hit genes from the different sample subsets, suggesting that the differential expression profile could not account per se to the clustering of insertions in specific regions of the genome. This observation extends the information from previous studies on MLV insertional hotspots (Cattoglio et al, 2007;Recchia et al, 2006) and is also supported by our analysis of expression profiles of CIS genes (Fig S6 of Supporting Information). We therefore hypothesized that the overall accessibility of the genome at the time of transduction, of which 'gene expression' is an indicator, could play a more general role in inducing the cell-specific insertional preferences of MLV vector. For this reason, we correlated insertion sites with other epigenetic features accounting for different chromatin conformations.  (Lewinski et al, 2006;Wang et al, 2010). Our results show that on average PBL-GT insertions were two times closer to HSS mapped on T cells (Boyle et al, 2008) as compared to HSC-GT RIS, thus accounting again for an additional mechanism linked to cell-specific preferences of MLV vector.

Research Article
Histone modifications are other important epigenetic markers of open/closed chromatin state and their distribution was recently mapped through ChIP-Seq technique both in human T cells and HSC/HPC (Barski et al, 2007;Cui et al, 2009). Correlation of insertion sites and histone methylations was previously shown in recent studies (Brady et al, 2009;Wang et al, 2010). Our work provides novel information on how the general chromatin state of two different haematopoietic cells influences vector integrations by the cross-comparison of vector insertion profiles in PBL-GT and HSC-GT with several histone marks mapped in both target cell types. As compared to random distribution, insertion sites were preferentially located in proximity of histone modifications associated with an open chromatin state (Fig 5A). The strongest preference was noted for H3K4me3, a modification that is mainly present in correspondence of TSS of genes (Barski et al, 2007), similarly to recent observations in the X-SCID GT trial (Wang et al, 2010) thus indicating that this is a common feature of MLV vectors. Importantly, no cell-specific behaviour was found when comparing H3K4me3 mapped in the two target cells. One could envisage that, at a genome wide level, many of these H3K4me3 modifications share the same position in both T cells and CD34þ cells. Indeed, in a recent work on chromatin remodelling upon lymphoid differentiation, histone modifications involved in chromatin decondensation related to lymphoidaffiliated genes were already detected in HSCs (Maes et al, 2008). Histone methylations associated with heterochromatic conformations are, on the other hand, disfavoured by MLV vector integrations. Strikingly, among all histone markers, we discovered that H3K27me3 alone represents a key epigenetic determinant of cell-dependent integration profile of MLV vector, since it was the only modification that was disfavoured in a cell-specific fashion. These results are in agreement with the findings of in vitro studies showing that H3K27me3 distribution was significantly changed upon differentiation of haematopoietic progenitor cells ) and from the analysis of epigenetic marks in distinct lymphoid lineages were the genomic association with H3K27me3 signals was both gene-and cell-specific (Wei et al, 2009). Our results support the concept that the different distribution of these cell-specific 'closed windows' in the genome of our target cells is indirectly driving the vector to 'open windows' associated with general chromatin accessibility and gene expression. In this view, the bias of MLV for some specific active loci could also be the consequence of chromatin inaccessibility to integration events of other regions. It should be also noted that the majority of the histone marks analysed, even when mapped in correspondence of genes, are not necessarily associated to TSS, a preferential target for MLV vectors, thus suggesting a general active role of epigenetic signals also on integrations retrieved inside coding regions or collected from intergenic spaces. The different influence of some of these signals in T cells and HSC/HPC on integration events is further evident when focusing on LMO2 and TCRA regions that were specific insertional hotspots in HSC-GT and PBL-GT, respectively (Fig 5B). By the analysis of CIS and NOT CIS loci we found that the cell-specific disfavouring for H3K27me3 signal is a general rule of integration site selections both in hotspots or regions hosting single integration events (Fig S9 of Supporting Information). Nevertheless, it is interesting to notice that CIS derived from HSC-GT in vivo CD3þ cells displayed a different distribution of this histone modification as compared to NOT CIS subset and to the CD34þ cells in vitro from which they derived. This is in agreement with the results regarding the transcriptional activity of CIS genes (significantly different between HSC in vitro and T cells in vivo from HSC-GT patients, Fig S6 of Supporting Information) and again points to a more complex mechanism of physiological selection of insertions upon in vivo differentiation.
In conclusion, our high-throughput analysis shows that MLV vector displays integration preferences that are cellspecific and closely related to the genomic and chromatin state of target cells at the time of transduction. Most of these insertional features are mirrored in in vivo patient samples without showing any particular deviation from in vitro vector distribution even several years after GT. These results better define the 'physiological' MLV vector behaviour in GT patients treated with different gene-corrected cells in absence of adverse events, thus providing information of reference for the follow up of ongoing clinical trials based on the use of gammaretroviral vectors and for the future design of novel gene transfer approaches for genetic correction of hematopoietic cells.

Patients and clinical trials
ADA-SCID lacking HLA-identical sibling donors were enrolled in the clinical trials. PBL-GT patients were enrolled between 1992 and 1998 in a phase I/II clinical trial with repeated infusions of transduced autologous PBL (#NCT00599781) PBL-GT Pt1, 2, and 3 have been previously described (Aiuti et al, 2002b;Bordignon et al, 1995) while Pt4 received a similar treatment (Aiuti et al, in preparation). HSC-GT patients were enrolled between 2002 and 2005 in a phase I/II clinical trial for ADA-SCID GT with transduced autologous CD34þ cells (#NCT00598481) and HSC-GT Pt4-7 were previously described (Aiuti et al, 2009). For both groups the GIADAl retroviral vector encoding ADA cDNA under the MLV long terminal repeat (LTR) promoter was used (Aiuti et al, 2002b). Vector production and transduction protocol have been previously described (Aiuti et al, 2002a(Aiuti et al, ,b, 2009). The clinical trials described in this study were approved by San Raffaele Scientific Institute Ethical Committee and Italian national regulatory authorities. All patients signed the informed consent to the experimental treatment and follow up analyses.

Cell purification and measurement of vector positive cells
In vitro samples were derived from a fraction of transduced T cells or CD34þ cells that were infused in patients, kept in culture for additional 3 days after trasduction. In vivo cell subsets were purified from peripheral blood of ADA-SCID patients as previously reported (Aiuti et al, 2007). To detect the presence of the vector in patient cells we extracted genomic DNA by QIAamp DNA Blood Mini kit or Micro kit (Qiagen) and we performed qPCR for vector positivity with primers specific for Neo R reporter gene and glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as shown previously (Aiuti et al, 2002a;Cassani et al, 2008). Percentage of transduced cells were calculated on the basis of a standard curve and expressed as proportion of cells carrying the Neo R gene.

Analysis of vector integration
To collect integration sites from patient samples we performed LM-PCR with the use of MseI enzyme as elsewhere reported (Aiuti et al, 2007;Wu et al, 2003). Fasta sequences from insertions previously retrieved with the same technique in CD3þ cells of HSC-GT Pt4 and Pt5 were also added to the analysis (Aiuti et al, 2007). We also set up another protocol on the basis of LAM-PCR technique previously described  which consisted in two steps of linear amplifications using two 5 0 -biotynilated primers designed in forward direction on MLV LTR (MLV1: 5 0 -GACTGAGTCGCCCGGGTACCCGTGT-3 0 and MLV2: 5 0 -CCAATAAACCCTCTTGCAGTTGCA-3 0 ) under the following conditions: 958C, for 5 min, 50 cycles at 958C for 45 s, 608C for 45 s and 728C for 1.5 min, and a final step at 728C for 10 min. After ligation o/n with streptavidine magnetic beads (Invitrogen Dynabeads) linear amplified products went through a Klenow-mediated second strand reconstitution, MseI digestion and linker ligation. Fragments were then detached from beads with 15 min incubation with NaOH 0.1 M at 258C, exponentially amplified with primers for LTR and linker as in LM-PCR protocol and sanger sequenced. In addition, we retrieved integration sites through the combination of LAM-PCR with the use of up to four enzymes (TSP509I, Hinp1l, HpyCH4IV and MseI) and 454pyrosequencing (Roche) technique as described in a previous work (Bushman et al, 2008;Howe et al, 2008; see also Supporting Information 'Integration sequences from PBL-GT and HSC-GT' , 'List of CIS from HSC-GT' , 'List of CIS from PBL-GT'). In total, 6% of the integrations (137 out of 2198) from PBL-GT and 2% of the integrations (47 out of 1959) from HSC-GT were retrieved with MseI enzyme only, while the majority derived from the use of four restriction enzymes in both the trials.

Bioinformatics and statistical analysis
Details of bioinformatics and statistical methods used for the mapping of vector integrants and the analysis of hit gene functions as well as correlations of vector insertions with expression profile and epigenetic statuses of target cells are available in Supporting Information 'Bioinformatics and Statistical analysis' . Expression data from microarrays experiments are deposited on ArrayExpress public database (http://www.ebi.ac.uk/arrayexpress/).

Author contributions
LB designed the project, performed experiments, analysed data and wrote the manuscript. AAm, DP and CDS performed bioinformatics and statistical analyses. IB performed patients' samples purifications. CB, CK and MS revised the manuscript and performed integration sites retrieval. MGR provided scientific support and revised the manuscript. AAi supervised the project, wrote and revised the manuscript.

PROBLEM:
Retroviral vectors have been used as effective tools to transfer therapeutic genes for the treatment of haematological inherited disorders. Studies on how the vectors integrate into the host genome of different cell types and how gene-corrected cells engraft and survive long-term in patients are crucial to provide information on the safety and efficacy of GT approaches.

RESULTS:
We studied the properties of genomic integration sites of a gammaretroviral vector in haematopoietic cells from GT-treated patients affected by adenosine deaminase deficient-severe combined immunodeficiency (ADA-SCID), treated either with mature lymphocytes or haematopoietic stem cells, in clinical contexts free from adverse events. We analysed the influence of target cell type on vector integrations both at the time of gene transfer and years after infusion in GT treated patients. Our study uncovered a cell-specific insertion profile of retroviral vector dependent on functional and transcriptional activity as well as on epigenetic and chromatin conformations of host genome.

IMPACT:
This work unveiled the genomic features influencing the 'physiological' integration site selection of a gammaretroviral vectors in two different target cell types, providing crucial information for the follow up of current and future GT trials based on genetic modifications of haematopoietic cells.