Integrated Structure-Transcription analysis of small molecules reveals widespread noise in drug-induced transcriptional responses and a transcriptional signature for drug-induced phospholipidosis

We performed an integrated analysis of drug chemical structures and drug-induced transcriptional responses. We demonstrated that a network representing 3D structural similarities among 5,452 compounds can be used to automatically group together drugs with similar scaffolds and mode-of-action. We then compared the structural network to a network representing transcriptional similarities among a subset of 1,309 drugs for which transcriptional response were available in the Connectivity Map dataset. Analysis of structurally similar, but transcriptionally different, drugs sharing the same mode of action (MOA) enabled us to detect and remove weak and noisy transcriptional responses, greatly enhancing the reliability and usefulness of transcription-based approaches to drug discovery and drug repositioning. Analysis of transcriptionally similar, but structurally different drugs with unrelated MOA, led us to the identification of a “toxic” transcriptional signature indicative of lysosomal stress (lysosomotropism) and lipid accumulation (phospholipidosis) partially masking the target-specific transcriptional effects of these drugs. We further demonstrated by High Content Screening that this transcriptional signature is caused by the activation of the transcription factor TFEB, a master regulator of lysosomal biogenesis and autophagy. Our results show that chemical structures and transcriptional profiles provide complementary information and that combined analysis can lead to new insights on on- and off-target effects of small molecules.


5 4
We verified that PLD/CAD compounds tend to induce a stronger transcriptional response 2 5 5 (i.e. a lower TV) (Supplementary Fig. 12) and they tend to be transcriptionally similar among them 2 5 6 (but not structurally) despite having different mode of action and therapeutic applications 2 5 7 ( Supplementary Fig. 13).

5 8
We next asked which genes were transcriptionally modulated by the majority of PLD/CAD 2 5 9 compounds. We performed Drug Set Enrichment Analysis (DSEA), 51 a computational approach we 2 6 0 recently developed to identify gene-sets that are transcriptionally modulated by most drugs in a 2 6 1 given set. The most significant gene-set shared by the 35 PLD/CAD compounds, out of about 2 6 2 5,000 gene-sets within the Gene Ontology (GO) database, was the GO-Cell Component term 2 6 3 "lysosome" consisting mainly of genes coding for lysosomal enzymes and ion channels 2 6 4 (p=5.03x10 -8 -Supplementary Table 4), thus in agreement with the "lysosomotropic" effect of 2 6 5 these drugs.

6 6
Recently, the transcription factor E-box (TFEB) has been found to be a major player in the 2 6 7 transcriptional control of lysosomal genes in response to a variety of cellular and environmental 2 6 8 stresses. 52 In normal nutrient conditions TFEB is phosphorylated by the mTORC1 complex on the 2 6 9 lysosomal surface. This phosphorylation favours TFEB binding to 14-3-3 proteins and its retention 2 7 0 in the cytoplasm. [53][54][55] Upon stress signal, such as nutrient deprivation, mTOR is inhibited, the 2 7 1 calcium-dependent phosphatase Calcineurin is activated, and TFEB is de-phosphorylated shuttling 2 7 2 to the nucleus where it transcriptionally controls lysosomal biogenesis, exocytosis and 2 7 3 autophagy. 53-59 Moreover, TFEB was shown to translocate to the nucleus upon amiodarone 2 7 4 treatment, a well known lysosomotropic agent. 60 We thus decided to investigate whether TFEB 2 7 5 activation was responsible for the characteristic transcriptional response induced by PLD/CAD 2 7 6 compounds. 2 7 7 2 7 8 The transcriptional response of PLD-inducing compounds is mediated by TFEB 2 7 9 We performed a panel of High Content Screening assays including the TFEB nuclear 2 8 0 translocation assay (TFEB-NT) 59 at 3h and 24h following drug administration at different 2 8 1 concentrations (0.1µM, 1µM and 10µM) for 34 out of 35 PLD drugs (1 drug was not available to us 2 8 2 at the time). Additional HCS assays at 24h included LAMP-1 immunostaining and Lysotracker dye 2 8 3 to quantify lysosomal compartment (Methods), GM130 and PDI immunostaining to detect 2 8 4 morphological changes in the Golgi and ER (Endoplasmic Reticulum) compartments, both of which 2 8 5 have been recently suggested to be involved in PLD aetiology (Methods). We also performed the 2 8 6 LipidTox assay at 48h to check for the accumulation of phospholipids to confirm PLD at least in 2 8 7 vitro (Methods).  Table 5. Nuclear translocation of TFEB at 3h was observed for 18 2 9 0 out of 34 drugs (53%) increasing to 29 drugs at 24h (85%). Out of these 29 drugs, 27 induced an 2 9 1 increase in lysosome size and number as evidenced by LAMP1 and Lysotracker staining, and all 2 9 2 29 drugs induced accumulation of phospholipids according to the Lipidtox assay (100%). Only 5 2 9 3 drugs did not induce TFEB translocation at 24h, and just 1 out of these 5 drugs was positive in the 2 9 4 Lysotracker assay, while 4 of them were positive in the Lipidtox assay. None of the drugs tested 2 9 5 were positive for the Golgi marker and only 6 were positive for the ER marker, albeit marginally. Overall, HCS confirmed a concentration dependent nuclear translocation of TFEB for 29 out 2 9 7 of 34 drugs (85% at 24h) with a concomitant perturbation of the lysosomal compartment for 28 out 2 9 8 of 34 drugs (82%) occurring mostly at the highest dosage tested (10 µM). Furthermore, HCS 2 9 9 revealed an accumulation of lipid in vitro at 48h following treatment with the 34 drugs (100%) at the 3 0 0 highest dosage tested (10 µM), as previously reported in the literature. 45 3 0 1 These results support the role of TFEB in shaping the transcriptional response of cells 3 0 2 treated with PLD inducing drugs in a way completely unrelated to their MoA. We next asked 3 0 3 whether the activation of TFEB (or TFE3, another member of the MiT family of transcription factors 3 0 4 with similar functions) is a consequence of lysosomal stress upon compound treatment or if it is 3 0 5 directly related to the induction of the PLD phenotype. Thus, we set up a HCS Lipidtox assay using 3 0 6 TFEB wt versus TFEB/TFE3 KO in HeLa cell type, administering high dosage of chloroquine (50 3 0 7 µM) known to induce lipids accumulation in cells at 48h. Supplementary Fig. 15a, b show no 3 0 8 major differences in terms of spot intensity in the Lipidtox assay, thus confirming that TFEB 3 0 9 activation is a consequence of lysosomal stress and not an inducer of PLD. We combined the transcriptional responses elicited by the 35 PLD/CAD compounds into a 3 1 8 consensus transcriptional response ("PLD" signature) and computed its transcriptional distance 3 1 9 from all the other 1274 (i.e. 1309-35) CMAP compounds (Methods). We reasoned that drugs 3 2 0 inducing a transcriptional profile similar to the PLD signature should have a higher probability of 3 2 1 inducing lipid accumulation than the other drugs. Surprisingly, 258 compounds out of 1274 (20%) 3 2 2 cMAP compounds were found to be similar to the PLD signature (Supplementary Table 6). About 3 2 3 a third of these drugs are CADs (77 out of 258 (30%)). 3 2 4 Figure 5 reports a breakdown by ATC classes of drugs for which an ATC code was 3 2 5 available and that were found to induce a transcriptional response similar to the PLD signature.

2 6
By analysing a large set of chemical structures, we generated a network representing 3 8 1 structural similarities among compounds that can be used to automatically group together drugs 3 8 2 with similar scaffolds and mode-of-action. Other methods to cluster drugs based on structural 3 8 3 similarity have been proposed in the literature 16 but no hierarchical classification of drugs in 3 8 4 communities and rich-clubs based on the network structure has been previously performed. By 3 8 5 comparing the structural drug network with the transcriptional drug network, we observed broad 3 8 6 differences between the two: drugs can be very similar in terms of the transcriptional response they 3 8 7 induce, but with unrelated chemical structures, or vice-versa have very similar structures but 3 8 8 induce diverse transcriptional responses.

8 9
Here, we identified a set of confounding factors that can hinder the usefulness of 3 9 0 transcriptional based methods. We introduced a simple but powerful measure, "Transcriptional 3 9 1 Variability" (TV), to assess the strength and robustness of the transcriptional response of a cell to a 3 9 2 drug treatment.

9 3
In the original CMAP study, 2 the authors indeed recognised that although gene-expression 3 9 4 signatures can be highly sensitive, they may be uninformative if measured in cells that lack the 3 9 5 appropriate physiological or molecular context, but offered no solution to identify such cases. We 3 9 6 observed that glucocorticoids tend to have a high TV, hence uninformative transcriptional profiles. 3 9 7 Indeed, MCF7, 66, 67 HL60 and PC3 68, 69 cell lines used in CMAP may exhibit resistance to 3 9 8 glucocorticosteroids 2 . Hence, if not filtered out, computational analysis of their transcriptional 3 9 9 responses may be misleading and lead to wrong conclusions, e.g. such that betamethasone and 4 0 0 dexamethansone have a different mode of action (Figure 2). 4 0 1 We also uncovered a transcriptional signature common to a subset of transcriptionally 4 0 2 similar but structurally distinct drugs profiled in CMAP that is not related to their mode of action, but 4 0 3 rather to cellular toxicity caused by lysosomal stress and lipid accumulation. Based on the results here presented, we suggest guidelines to prevent inconsistencies and 4 2 2 erroneous conclusion when using transcriptional responses of small molecules for drug discovery 4 2 3 and drug repositioning: (i) the transcriptional response elicited by a drug can be uninformative.

2 4
Hence these responses must be detected and then excluded from further analyses. We 4 2 5 demonstrated that this can be achieved by assessing the Transcriptional Variability (TV) of the 4 2 6 drug-induced transcriptional response across multiple replicates; (ii) drug treatment can cause 4 2 7 cellular stress unrelated to the drug MoA and thus affect the drug-induced transcriptional response 4 2 8 by partially masking transcriptional changes directly related to the drug molecular targets. We 4 2 9 generated a PLD transcriptional signature which can be used to detect these compounds. This as described before (Supplementary Fig. 1). The total number of chemical structure used for 4 5 0 further analysis was thus equal to 5452.

5 1
The ChemAxon Standardizer tool (v. 14.9) was run to convert SMILES string annotations 4 5 2 into 2D multi-SDF structural files. 74 The "remove fragments" and "neutralize" options were used to 4 5 3 fix all the molecular structures, to remove counter-ions and other various kinds of molecular 4 5 4 fragments, which may be present in branded drug formulation but not useful in this work (e.g. force-field parameterised for small organic molecules such as drugs. Partial charges are based on 4 6 0 bond-charge increments. Conjugated nitrogens are considered as planar. Thus, a unique 3D multi-4 6 1 SDF file was obtained and used as input file for all the subsequent analyses. activity. 30,77,78 The evaluation of MIF volume superimpositions between the two structures is 4 8 7 reported as a similarity score ranging from 0 to 1 for each of the three probes. A global score 4 8 8 (GLOB-Sum) is then obtained as the sum of the three scores of the individual probes. Higher 4 8 9 GLOB-S values correspond to more similar structures. For this study, we transformed the GLOB-4 9 0 Sum similarity score matrix (S) of dimension 5452x5452 into a distance matrix defined as D=1-S/3. 4 9 1 Since the distance matrix is symmetric (i.e. the distance between A and B is the same as 4 9 2 the distance between B and A), the total number of drug-pairs to consider is 14,859,426 (5452 x 4 9 3 5451 /2). 4 9 4 Construction of the drug network 4 9 5 We ranked drug-pairs according to their structural distance in ascending order and 4 9 6 considered as significant only those drug-pairs in the top 5% of the ranked list, as previously 4 9 7 described by Iorio et. al. 4 to reduce the total amount of egdes in the MANTRA network (The 4 9 8 distance threshold is 0.51 when considering the 5452x5452 network or 0.65 when considering only 4 9 9 the CMAP 1309x1309 sub-network). We then represented drugs as nodes connected by edges. 5 0 0 The resulting Structural Drug Network has a giant connected component with 5312 nodes (i.e., 5 0 1 drugs) out of 5,452 and 35,527 edges, corresponding to 5% of a fully connected network with the 5 0 2 same number of nodes (14,859,426 edges) (Supplementary Fig. 4). In order to visualise and 5 0 3 extract useful information from the SDN, we identified communities via the Affinity Propagation 5 0 4 Clustering algorithm, as implemented in the R package apcluster (v. 1.3.5). 32, 79 A community is 5 0 5 defined as a group of nodes densely interconnected with each other and with fewer connections to 5 0 6 nodes outside the group. 80 Each community was coded with a numerical identifier, a colour, and 5 0 7 one of its nodes was identified as the "exemplar" of the community, i.e., the drug whose effect best 5 0 8 represents the effects of the other drugs in the community. 4 5 0 9 5 1 0 5 1 1

Validation of the Structural Drug Network 5 1 2
To validate the drug structural network, we assessed whether pairs of drugs connected by 5 1 3 an edge in the network (i.e. structurally similar according to our distance) shared a common clinical 5 1 4 PDI, Cell Signaling Technology) followed by the incubation with an AlexaFluor-conjugated 5 6 8 secondary antibodies (Life Technologies) diluted in blocking buffer. LysoTracker Red DND-99 (Life 5 6 9 Technologies) staining was performed by the incubating the dye for the last 30 minutes before 5 7 0 fixation. DAPI and CellMask Deep Red Plasma membrane Stain (Life Technologies) were used for 5 7 1 nuclei and plasma membrane staining, respectively. Images of of lysosomes (LAMP-1 and 5 7 2 LysoTracker Red DND-99), Golgi (GM130) and ER (PDI) were acquired using an automated 5 7 3 confocal microscopy (Opera High Content System, Perkin-Elmer). The fluorescent intensity and 5 7 4 area of the different stainings were analyzed by using dedicated scripts developed in the Columbus 5 7 5 Image Data Management and Analysis Software (Perkin-Elmer).

7 6
High Content Lipid accumulation assay: LipidTOX green phospholipidosis detection reagent (Life 5 7 7 Technologies) was added to the cells along with the different compounds at the indicated 5 7 8 concentrations for 48h before fixation with 4% paraformaldehyde. DAPI and CellMask Deep Red 5 7 9 Plasma membrane Stain (Life Technologies) were used for nuclei and plasma membrane staining, 5 8 0 respectively. Lysosomal phospholipid accumulation was analyzed by measuring fluorescent dye 5 8 1 intensity using an automated confocal microscopy (Opera High Content System, Perkin-Elmer) and 5 8 2 a Columbus Image Data Management and Analysis Software (Perkin-Elmer).