Data on master regulators and transcription factor binding sites found by upstream analysis of multi-omics data on methotrexate resistance of colon cancer

Computational analysis of master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of a cell is a new approach of causal analysis of multi-omics data. This paper contains results on analysis of multi-omics data that include transcriptomics, proteomics and epigenomics data of methotrexate (MTX) resistant colon cancer cell line. The data were used for analysis of mechanisms of resistance and for prediction of potential drug targets and promising compounds for reverting the MTX resistance of these cancer cells. We present all results of the analysis including the lists of identified transcription factors and their binding sites in genome and the list of predicted master regulators – potential drug targets. This data was generated in the study recently published in the article “Multi-omics “Upstream Analysis” of regulatory genomic regions helps identifying targets against methotrexate resistance of colon cancer” (Kel et al., 2016) [4]. These data are of interest for researchers from the field of multi-omics data analysis and for biologists who are interested in identification of novel drug targets against NTX resistance.

These data are of interest for researchers from the field of multi-omics data analysis and for biologists who are interested in identification of novel drug targets against NTX resistance. & 2017 Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Biology More specific subject area Analysis of molecular mechanisms of diseases using NGS, microarrays and novel proteomics technologies Type of data Table, text file, graph, figure How data was acquired The data were generated with the help of geneXplain platform version 3.

Data
We here present the results of the analysis of the data of three different omics experiments, namely, transcriptomics, proteomics and epigenomics, that were performed independently in the same type of cell line. After necessary preprocessing of the obtained raw data we performed a special type of computational analysis, which we call "upstream analysis" that helps to integrate these three omics data types and identify master regulators of the methotrexate resistance of colon cancer. We identified master regulators through the search for transcription factor binding sites followed by analysis of signal transduction networks of the cancer cells under study. The found master regulators helped to identify chemical compounds and existing drugs as inhibitors of those master regulators and therefore as potentially helpful for reverting the obtained MTX resistance.

Experimental design, materials and methods
1) At the first step we analysed the transcriptomics data and compared the MTX resistant and MTX sensitive cells. We revealed differentially expressed genes (DEG) using Limma analysis [1] with the p-value cut-off 0.05 (corrected for the multiple testing). Among them, we found 1951 up-regulated genes (  [3] and identified pairs of transcription factor binding sites overrepresented at the promoters of up-regulated genes in MTX resistant cells. We  identified 6 pairs of TRANSFAC PWMs the matches of which are clustered in these promoters. (see Fig. S2 in [4]) Link: (also showing the positions of the identified site pairs in the promoters of the up-regulated genes under study) http://platform.genexplain.com/bioumlweb/#de¼data/Projects/MTX%20resistance/Data/GSE11440_ RAW/Normalized%20(RMA)%20DEGs%20with%20limma/Condition_1%20vs.%20Condition_2/Up-regulated%20genes%20Ensembl%20FC1.5%20sites%20-1000..100%20non-redundant_minSUM/CMA% 206modules%202sites%20(Up-regulated%20genes%20Ensembl%20FC1.5%20sites%20-1000..100% 20non-redundant_minSUM)/Model%20visualization%20on%20Yes%20set&anonymous¼true. Table 3 CMA sites in promoters UpFC1.5 track.interval in Supplementary materials gives genomic coordinates (build GRCh37) of the identified transcription factor binding site pairs in the promoters of the up-regulated genes under study. 4) At the next step we identified peaks of the CDK8 antibody ChIP-seq data in HT29 cell line using the peak calling program MACS [26] (without control and with almost all default parameters, except parameter "Enrichment ratio", which was set to value 5 in order to achieve higher number of peaks). We identified 29,400 peaks of CDK8 complex binding in the whole human genome. These peaks were mapped to the vicinity of 17,115 genes in human genome ( À2000 þ 2000 around 5 0 and 3 0 borders of the genes). The information about all these genes with the position of these peaks and the schemas of peak locations in the gene structure is presented here: Link: http://platform.genexplain.com/bioumlweb/#de ¼data/Projects/MTX%20resistance/Data/HT29_ ChIP-seq/Track%20genes&anonymous ¼true.
We retrieved the common genes of this list with the list of upregulated genes in MTX resistant cellsand identified 1347 genes that contain such peaks in their potential regulatory regions (in 5 0 regions, in introns, and 3 0 regions of the genes). The result of such overlap is shown in Fig. 1 below.
As a result we extracted 710 genomic intervals of 400 bp length each around summits of CDK8 peaks in the up-regulated genes. We consider these intervals as potential MTX resistance enhancers.

5)
We performed a site frequency analysis (F-Match) and composite site analysis (CMA) in those MTX resistance enhancers in a similar same way as we did in promoters of Up-regulated genes. The results of this analysis is present in Fig. 2 below (see also the data in Table 5 Site optimization summary_CDK8_400_summit_DnFC1.0.txt, Supplementary material).
6) At the next step we performed the master regulator search as it is described in [2] with a modified algorithm described in the paper [4], using proteomics data as "context proteins". The proteomics data were matched to the proteins in TRANSPATH database [5]. The list of the TRANSPATH matched proteins found in HT29 cell line is in Table 6 HT29_colon_cancer_cell_line Ensembl proteins Proteins Transpath peptides a annotated.txt, Supplementary material.
supported (VP) in the framework of the Russian State Academies of Sciences Fundamental Research Program for 2013-2020. This work was also supported by the following grants of the EU FP7 program: "SysMedIBD" no. 305564, "RESOLVE" no. 305707 and "MIMOMICS" no. 305280.

Transparency document. Supplementary material
Transparency data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2016.11.096.