Elucidating the modes of action for bioactive compounds in a cell-specific manner by large-scale chemically-induced transcriptomics

The identification of the modes of action of bioactive compounds is a major challenge in chemical systems biology of diseases. Genome-wide expression profiling of transcriptional responses to compound treatment for human cell lines is a promising unbiased approach for the mode-of-action analysis. Here we developed a novel approach to elucidate the modes of action of bioactive compounds in a cell-specific manner using large-scale chemically-induced transcriptome data acquired from the Library of Integrated Network-based Cellular Signatures (LINCS), and analyzed 16,268 compounds and 68 human cell lines. First, we performed pathway enrichment analyses of regulated genes to reveal active pathways among 163 biological pathways. Next, we explored potential target proteins (including primary targets and off-targets) with cell-specific transcriptional similarity using chemical–protein interactome. Finally, we predicted new therapeutic indications for 461 diseases based on the target proteins. We showed the usefulness of the proposed approach in terms of prediction coverage, interpretation, and large-scale applicability, and validated the new prediction results experimentally by an in vitro cellular assay. The approach has a high potential for advancing drug discovery and repositioning.


Supplementary Figures Legends
Supplementary Figure S1 Venn diagram of cell lines (left) and chemical compounds (right) present in TG-GATEs, CMap, and LINCS; the three databases containing chemically-induced gene expression profiles.
Supplementary Figure S2 Landmark genes of the cell cycle that are partially downregulated by drug perturbations. Information concerning drugs and cell lines are shown in the first three columns. Drugs are categorized according to the ATC code. Names of drugs and genes are given in alphabetical order. Downregulated genes are colored green. Code J: anti-infectives for systemic use; code L: anti-neoplastic and immunomodulating agents; code N: nervous system; code R: respiratory system; and code S: sensory organs.
Supplementary Figure S3 Distribution of drug classifications according to the biological pathways that they activate (top) and inactivate (bottom). The fraction of drugs in a particular classification that affect each pathway is represented by the intensity of color in the appropriate box. The intensity of color indicates the relative frequency (the compound frequency was divided by the number of compounds in each Anatomical Therapeutic Chemical classification system (ATC code)). The boxes are arranged according to the first level of the ATC code. Drug are assigned the following ATC codes: code A: alimentary tract and metabolism; code B: blood and blood-forming organs; code C: cardiovascular system; code D: dermatologicals; code G: genitourinary system and sex hormones; code H: systemic hormonal preparations, excluding sex hormones and insulins; code J: anti-infectives for systemic use; code L: anti-neoplastic and immunomodulating agents; code M: musculo-skeletal system; code N: nervous system; code P: anti-parasitic products, insecticides and repellents; code R: respiratory system; code S: sensory organs; and code V: various.
Supplementary Figure S4 Distribution of the identified pathways in RE-ACTOME. (a) the histogram of detected pathways by the result of analyzing all compounds, where the horizontal axis indicates the list of biological pathways and the vertical axis indicates the frequency of detected pathways. (b) the histogram of the numbers of detected pathways for each compound, where the holizontal axis indicates the number of detected pathways for each compound and the vertical axis indicates the frequency of compounds. Red bars indicate the numbers of activated pathways, identified using upregulated genes, and green bars indicate the numbers of inactivated pathways, identified using downregulated genes.
Supplementary Figure S5 Distribution of drug classifications according to the biological pathways that they activate (top) and inactivate (bottom) based on REACTOME. The dendrogram shows the result of clustering pathways according to their similarities of the drug classifications. The fraction of drugs in a particular classification that affect each pathway is represented by the intensity of color in the appropriate box. The intensity of color indicates the relative frequency (the compound frequency was divided by the number of compounds in each pathway). The boxes are arranged according to the first level of the Anatomical Therapeutic Chemical classification system (ATC code). Drug are assigned the following ATC codes: code A: alimentary tract and metabolism; code B: blood and blood-forming organs; code C: cardiovascular system; code D: dermatologicals; code G: genitourinary system and sex hormones; code H: systemic hormonal preparations, excluding sex hormones and insulins; code J: anti-infectives for systemic use; code L: anti-neoplastic and immunomodulating agents; code M: musculo-skeletal system; code N: nervous system; code P: antiparasitic products, insecticides and repellents; code R: respiratory system; code S: sensory organs; and code V: various.
Supplementary Figure S6 Distribution of drug classifications according to the biological pathways that they activate (top) and inactivate (bottom) based on REACTOME. The fraction of drugs in a particular classification that affect each pathway is represented by the intensity of color in the appropriate box. The intensity of color indicates the relative frequency (the compound frequency was divided by the number of compounds in each Anatomical Therapeutic Chemical classification system (ATC code)). The boxes are arranged according to the first level of the ATC code. Drug are assigned the following ATC codes: code A: alimentary tract and metabolism; code B: blood and blood-forming organs; code C: cardiovascular system; code D: dermatologicals; code G: genitourinary system and sex hormones; code H: systemic hormonal preparations, excluding sex hormones and insulins; code J: anti-infectives for systemic use; code L: anti-neoplastic and immunomodulating agents; code M: musculo-skeletal system; code N: nervous system; code P: anti-parasitic products, insecticides and repellents; code R: respiratory system; code S: sensory organs; and code V: various.
Supplementary Figure S7 Venn diagram of new compound-protein interactions predicted using CMap and LINCS. The left panel shows the result for the same cell line-matching strategy, and the right panel shows that for the all cell line-matching strategy.
Supplementary Figure S8 Venn diagram of new compound-disease associations predicted using CMap and LINCS. The left panel shows the result for the same cell line-matching strategy, and the right panel shows that for the all cell line-matching strategy.
Supplementary Figure S9 Examples of newly predicted target proteins of drugs (a-c) and pathway enrichment analysis (d). Blue circles denote drugs, red rectangles denote proteins, gray diamonds indicate ATC codes, and gray edges and red dotted lines denote known interactions and newly predicted interactions, respectively.
Supplementary Figure S10 Dose response curves of four tested drugs in the AR-binding assay run in (a) agonist and (b) antagonist modes. The horizontal axis represents the drug concentrations on a logarithmic scale, and the vertical axis represents the percentages of drug activity. The open circles represent the data points from triplicate experiments.
Supplementary Figure S11 Histograms for evaluating the similarity between gene expression signatures in CMap and LINCS. The left and right panels show the distributions of similarity scores between signatures obtained using cell lines MCF7 and PC3, respectively. The signatures were obtained using the "Biological control" from profiles measured at 6 (6.4) h. The numbers of compounds used to treat both MCF7 and PC3 cells in CMap and LINCS were 331 and 283, respectively. The similarity scores were calculated using the Pearson correlation coefficient.
Supplementary Table S1 Detailed evaluation of target protein prediction using common data.   Each element represents the number of drugs repositioned from the original disease class to new disease classes using the CMapbased method. The rows indicate the original ICD disease chapters, and the columns indicate the newly predicted ICD disease chapters. Chapter I: certain infectious and parasitic diseases (A00-B99); chapter II: neoplasms (C00-D48); chapter III: diseases of the blood, blood-forming organs, and certain disorders involving the immune mechanism (D50-D89); chapter IV: endocrine, nutritional, and metabolic diseases (E00-E90); chapter V: mental and behavioral disorders (F00-F99); chapter VI: diseases of the nervous system (G00-G99); chapter VII: diseases of the eye and adnexa (H00-H59); chapter VIII: diseases of the ear and mastoid process (H60-H95); chapter IX: diseases of the circulatory system (I00-I99); chapter X: diseases of the respiratory system (J00-J99); chapter XI: diseases of the digestive system (K00-K93); chapter XII: diseases of the skin and subcutaneous tissue (L00-L99); chapter XIII: diseases of the musculoskeletal system and connective tissue (M00-M99); chapter XIV: diseases of the genitourinary system (N00-N99); chapter XV: pregnancy, childbirth, and the puerperium (O00-O99); chapter XVI: certain conditions originating in the perinatal period (P00-P96); chapter XVII: congenital malformations, deformations; and chromosomal abnormalities (Q00-Q99); chapter XVIII: symptoms, signs, and abnormal clinical and laboratory findings not elsewhere classified (R00-R99); chapter XIX: injury, poisoning, and certain other consequences of external causes (S00-T98); chapter XX: external causes of morbidity and mortality (V01-Y98); chapter XXI: factors influencing health status and contact with health services (Z00-Z99); and chapter XXII: codes for special purposes (U00-U99).

Supplementary Table S5
The distribution of drugs repositioned from the original disease class to other disease classes using the LINCS-based method. Each element represents the number of drugs repositioned from the original disease class to new disease classes using the LINCSbased method. The rows indicate the original ICD disease chapters, and the columns indicate the newly predicted ICD disease chapters. Chapter I: certain infectious and parasitic diseases (A00-B99); chapter II: neoplasms (C00-D48); chapter III: diseases of the blood, blood-forming organs, and certain disorders involving the immune mechanism (D50-D89); chapter IV: endocrine, nutritional, and metabolic diseases (E00-E90); chapter V: mental and behavioral disorders (F00-F99); chapter VI: diseases of the nervous system (G00-G99); chapter VII: diseases of the eye and adnexa (H00-H59); chapter VIII: diseases of the ear and mastoid process (H60-H95); chapter IX: diseases of the circulatory system (I00-I99); chapter X: diseases of the respiratory system (J00-J99); chapter XI: diseases of the digestive system (K00-K93); chapter XII: diseases of the skin and subcutaneous tissue (L00-L99); chapter XIII: diseases of the musculoskeletal system and connective tissue (M00-M99); chapter XIV: diseases of the genitourinary system (N00-N99); chapter XV: pregnancy, childbirth, and the puerperium (O00-O99); chapter XVI: certain conditions originating in the perinatal period (P00-P96); chapter XVII: congenital malformations, deformations; and chromosomal abnormalities (Q00-Q99); chapter XVIII: symptoms, signs, and abnormal clinical and laboratory findings not elsewhere classified (R00-R99); chapter XIX: injury, poisoning, and certain other consequences of external causes (S00-T98); chapter XX: external causes of morbidity and mortality (V01-Y98); chapter XXI: factors influencing health status and contact with health services (Z00-Z99); and chapter XXII: codes for special purposes (U00-U99).
Supplementary Figure S1 Venn diagram of cell lines (left) and chemical compounds (right) present in TG-GATEs, CMap, and LINCS; the three databases containing chemically-induced gene expression profiles.