Comprehensive analysis of Downstream of Kinase (DOK) Genes in Pan-cancer

Background (cid:0) Human cells are often mutated in proto-oncogenes and tumor suppressor genes under the action of tumorigenic factors. When gene mutations accumulate gradually, cells will lose the normal regulation of growth and lead to abnormal proliferation, and then tumors occur. Therefore, understanding the role of these mutated genes in tumors may be the direction of future cancer therapy. Downstream of Kinase (DOK) proteins are a family of polygenic adapters, some of which are key negative regulators of immune cell signal transduction. Its expression level varies signicantly in different types of tumors, which is closely related to tumor formation, tumor microenvironment, microsatellite instability, and tumor mutation load. Methods (cid:0) We mainly use "R" software to process and analyze the data, use "Limma" package and "Wilcox" test to analyze the difference of gene expression, and use Cox proportional Hazards Regression, Kruskal Test, One-class Logistic Regression (OCLR) algorithm and "Corrplot" package and other analysis methods to further process the subsequent data. To get the result that we want to analyze. Results (cid:0) We found that the expression of DOK family genes varies across multiple tumors and is associated with patient survival. Further analysis showed that DOK gene was signicantly associated with tumor immunity, tumor microenvironment, tumor mutation load, etc. DOK2 was highly sensitive to a variety of drugs. Conclusion (cid:0) DOK gene family includes seven genes, DOK1-DOK7, which are signicantly differentially expressed in a variety of tumors, and are signicantly correlated with tumor immunity, tumor mutation load, tumor microenvironment, stemness indices, etc., which may provide potential therapeutic targets for future clinical treatment.


Introduction
By 2021, the United States is expected to have 1,898,160 new cancer cases and 608,570 cancer deaths (1), The treatment of cancer remains a major global health problem. Downstream of tyrosine kinase (DOK) gene consists of DOK1, DOK2, DOK3, DOK4, DOK5, DOK6 and DOK7, is a family of genes involved in regulating cell growth, transformation, differentiation, movement, and death (2,3). DOK proteins play a central role in the assembly of binding partners of different cell types, especially when triggered by receptor tyrosine kinases and immune receptors (3,4).
They have very similar structural characteristics. At the amino terminus of these genes, they contain both a region that can bind to phosphotyrosine and a domain homologous to Pleckstrin. It is these special structures that provide the necessary conditions for their plasma membrane recruitment. It has been reported in the literature that the expression of DOK4 and DOK5 genes in human T cells (5), and subsequently con rmed that DOK4 is a negative regulator of T cell activation (6).
In addition, mouse models show an important role of these DOK proteins in the immune response (2).
They are key negative regulators of immune cell signaling pathways. DOK1/2 is associated with a variety of hematopoietic malignancies, such as chronic myeloid leukemia, chronic lymphocytic leukemia, histiocytic sarcoma, and Burkitt's lymphoma (7). Another study reported that DOK7 expression was associated with survival and tumor recurrence in breast cancer patients (8).
Materials And Methods 2. Data downloading and preprocessing. We downloaded the latest HTSeq-FPKM, Phenotype, survival data, and mutation data for 33 GDC TCGA (The Cancer Genome Atlas) tumors from the UCSC Xena (http://xena.ucsc.edu/) website. At the same time, Perl (version 32, v5.32.0), R (version 3.6.3) and other tools were used to extract demographic information, tumor information and follow-up data of all patients from the TCGA database including a total of 11,057 patient data. Data related to the subsequent Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway were also obtained from Gene Set Enrichment Analysis (GSEA).
3. Clinical correlation analysis. To analyze the gene expression differences and co-expression relationships of 7 genes in the DOK family in normal samples and tumor samples. The ggpubr "R" package and the "Wilcox" test were used to detect and count differences in the expression of DOK family genes in tumor and normal samples. Tumor data with less than 3 normal samples would not be counted. in Supplement, we used the "corrplot" R package to analyze the co-expression of 7 genes in the DOK family, to explore the potential co-expression association within the genes of the DOKs. 4. Clinical correlation and survival analysis. From the downloaded data, we rst extracted the expression information of DOK family genes and analyzed its clinical relevance by combining the survival data of 33 tumors. We divided the gene expression into high expression group and low expression group according to the median and used R package to generate Kaplan-Meier map of DOK gene in pan-cancer. Besides, we adopt Cox proportional hazards regression analysis method, for each tumor type of DOK1, DOK2, DOK3, DOK4, DOK5, DOK6, DOK7 genetic risk than for evaluation.
Next, we speci cally analyzed whether there were differences in the expression levels of DOK family in gastric cancer with different pathological stages. P < 0.05 was de ned as a signi cant difference. 5. Immune subtype analysis. Tumor cells can promote the growth by changing and maintaining the conditions of their own survival through autocrine and paracrine. In recent years, great breakthroughs have been made in immunotherapy, which has better e cacy than conventional therapy in advanced patients. However, immunotherapy usually relies on the interaction of tumor microenvironment and internal immune regulation. Therefore, the analysis of gene differences in tumor microenvironment and immunity has a good guiding role for subsequent treatment. Thorsson  . Stemness indices and TME in pan-cancer. Tumor stem cells, a subset of cells that drive tumor growth, we provided a measure, Stemness Indices, to show how many cancer cells will resemble stem cells. Two independent stem cell similarity indices were established based on gene expression (RNAss) and DNA methylation signature (DNAss). The range of these indicators was set as 0 ~ 1, close to 0 indicated low similarity with stem cells, and close to 1 indicated high similarity with stem cells. We extracted DOKs gene expression data and stemness score for Spearman correlation analysis and analyzed TCGA tumor sample data using One-class Logistic Regression (OCLR) algorithm to obtain stemness indices.
Solid tumor tissue not only contains tumor cells, but also contains other cells such as stromal cells, immune cells and vascular cells, which together constitute the tumor microenvironment(TME) To better re ect the purity of tumor cells, we analyzed the proportion of tumor cells, stromal cells and immune cells in solid tumors. We used the "corrplot" package of R software combined with DOKs gene expression data to analyze the relationship between DOKs gene expression and tumor microenvironment, to guide treatment more effectively.
7. Drug sensitivity analysis in pan-carcinoma. CellMiner is a web-based tool compiled by the U.S. National Cancer Institute that provides genomic and pharmacological information for researchers to study transcriptional and drug responses using data from the NCI-60 cell line (10). We downloaded DOK gene expression level and drug sensitivity data from CellMiner and used "Impute" package to process the original data obtained. Pearson correlation analysis was used to analyze the potential relationship between DOK gene expression and compound sensitivity, and our analysis followed the method of Dong et al (11).

Results
1.Differential expression analysis and Co-expression Analysis of DOKs between Tumor and Normal Samples. (Fig. 1A) shows the expression levels of DOK gene family in tumors. It can be observed that the overall expression levels of DOK1, DOK2, DOK3 and DOK4 are higher than those of DOK5, DOK6 and DOK7.
We used Wilcox assay to analyze the differential expression of seven DOK family genes between tumor samples and paracancerous samples. The DOK gene family is highly expressed in most tumors. However, DOK gene expression was basically low in patients with LUAD, LUSC and KICH, except DOK5 was highly expressed in LUAD. DOK6 is signi cantly low expressed in GBM, which is different from other DOK genes.
In addition, DOK gene family was highly expressed in all gastric cancers (Fig. 1B).
Co-expression analysis showed the expression association between DOK gene families. It can be seen from the gure that DOK3 and DOK2 have the strongest synchronous co-expression(correlation coe cient = 0.66, p < 0:001). There are also signi cant co-expression correlations between DOK6 and DOK5, DOK1 and DOK2, and DOK1 and DOK3, with correlation coe cients of 0.3, 037 and 0.45, respectively. On the contrary, DOK2 has an opposite expression relationship with DOK5 and DOK6, the correlation coe cients are − 0.13 and − 0.18, and DOK6 and DOK7 also have an opposite expression relationship, the correlation coe cient is -0.19, P < 0.001 (Fig. 1C).
Almost all DOK family genes were signi cantly under-expressed in LUSC and KICH tumors. For CHOL tumors, all DOK genes were highly expressed, but only DOK1, DOK3, DOK4, and DOK7 showed statistically signi cant differences. Unlike DOK5 and DOK7, most DOK genes are signi cantly overexpressed in KIRC tumors. DOK6 is a special one in the DOK gene family. DOK6 expression is the lowest in the DOK family, and it is also the only gene with signi cantly low expression in GBM( gure S1-S2).
2.We performed KaplanMeier analysis on DOK1, DOK2, DOK3, DOK4, DOK5, DOK6 and DOK7 gene expression and overall survival time of 33 TCGA tumors (Fig. 2).First, we divided patients into high expression group and low expression group according to the limit of the median value of gene expression and compared whether different gene expression was associated with different survival time. There was a statistically signi cant difference between high DOK1 expression and poor prognosis in KIRC patients (P = 0.001). For LUAD patients, DOK1, DOK2, and DOK6 were all low expressed relative to adjacent or normal tissues, which was also consistent with the survival curve (P = 0.032, P = 0.012, and P = 0.049). The high expression of DOK2 and DOK3 is also a sign of poorer prognosis in GBM patients (P = 0.014, P = 0.035). DOK3 is highly expressed in KIRC patients, and higher expression means shorter survival time (P = 0.01), which is same as DOK5 in STAD patients (P = 0.044).
3.We mapped the forest gure to re ect the association between DOK family gene expression and prognosis of 33 TCGA tumor species (Fig. 3). Cox proportional hazard of regression method to detect DOK1, DOK2, DOK3, DOK4, DOK5, DOK6 and DOK7 prognostic role, and de nes its hazard ratio (HR) > 1 was a signi cant prognostic factor. It can be concluded from the Fig. 3 that DOK1,DOK2 and DOK3 have signi cance in most cancers. For STAD patients, DOK5,DOK6 and DOK7 are all correlated with their prognosis (P = 0.04, P = 0.01, P = 0.01).
4.We focused on analyzing the signi cance of DOK gene family in gastric cancer and exploring whether it is associated with different pathological stages. From the Fig. 4, we found that DOK2, DOK3 and DOK5 were signi cantly correlated with gastric cancer at different pathological stages. Interestingly, we found that the expression levels of stage I and IV genes were the lowest, while the expression levels of stage II and III genes were relatively high. the expression of these difference may be helpful in predicting the clinical development of tumors (Fig. 4). 5.Immune subtype analysis. More than 10,000 tumor samples from 33 types of TCGA were divided into six immune subtypes: C1-C6 was wound healing, IFN-γ-dominated, in ammatory, lymphodepletion, immunosilent, and TGF-β-dominated. We downloaded immune-related data from TCGA and used Kruskal test to analyze the mRNA expression of DOK family genes, to explore the relationship between seven DOK genes and various immune types. As can be seen from the (Fig. 5A), all DOK genes are signi cantly associated with the C1-C6 immune subtype P < 0.001. Further analysis showed that DOK4 had the highest overall expression level in C1-C6, while DOK6 had the lowest expression level. Interestingly, C5 has an abnormally high expression in DOK6, which is consistent with DOK5, whereas C5 has the lowest expression in DOK2 and DOK7. We continue analyzed the association between DOK family genes and immune subtypes in STAD patients. As shown in the (Fig. 5B), DOK2-DOK6 were signi cantly correlated with the immunity of gastric cancer P < 0.001, but DOK7 was not statistically signi cant with the six immune subtypes. The average expression of DOK4 was the highest among the six immune subtypes.
The expression of C6 was roughly the same as the overall immune analysis, and it was in a high position among all DOK genes.
6.The stemness index (DNAss) based on NDA methylation and the stemness index (RNAss) based on mRNA expression were used to measure and analyze the correlation between DOK gene and tumor stem cells. In order to investigate the association between the stemness features of pan-cancer and DOK gene expression, we calculated the stemness indices of TCGA tumor samples by using a one-class logistic regression (OCLR) algorithm and performed Spearman correlation analysis based on gene expression and stemness scores.
We can see that the correlation between DOK family genes and DNAss is generally not high. DOK6 has a large negative correlation with OV and TGCT, with correlation coe cients of -0.68 and − 0.78. DOK4 also had a signi cant negative correlation with TGCT, with a correlation coe cient of -0.73, suggesting that there were fewer tumor stem cells with high expression of these genes (Fig. 6A). Based on the analysis of RNAss, we found that most tumors were negatively correlated with DOK family genes, especially DOK5 and DOK6, which were signi cantly negatively correlated with most tumors. The special ones are LGG and THYM. DOK4 and DOK6 are positively correlated with LGG, with correlation coe cients of 0.59. In addition, DOK1 and DOK2 are also positively correlated with THYM, with correlation coe cients of 0.5 (Fig. 6B).
The occurrence, growth and metastasis of tumors are closely related to the surrounding environment. In addition to tumor cells, solid tumor tissues also include other normal cells, such as stromal cells and immune cells. Tumor cells could change this environment through autocrine or paracrine, and the body can also limit the occurrence and development of tumors by changing metabolism, secretion, immunity, structure, and other functions. All these constitute the tumor microenvironment (TME). The proportion of stromal cells and immune cells in solid tumors re ects the purity of the tumor, which has guiding signi cance for subsequent treatment.
We downloaded relevant data from TCGA database, ESTIMATE (using Expression data of Stromal cells and Immune cells in Malignant Tumors using Expression data) was used to calculate the scores of Stromal cells ( gure S3) and Immune cells ( gure S4) in tumor cells, and Spearman correlation analysis was used to describe the correlation between DOK family gene expression level and tumor purity. As can be seen from the (Fig. 6C), DOK2 and DOK3 are strongly correlated with stromal cells and immune cells of almost all tumors, which means that when DOK2 and DOK3 genes are highly expressed in patients, the purity of tumors will be reduced. The expression of DOK4 and DOK6 was positively correlated with the tumor purity of LGG, and the higher the gene expression, the lower the proportion of stromal cells and immune cells in the tumor, and the correlation coe cients were − 0.5 and − 0.6, respectively. 7.We analyzed the correlation between DOK gene and DNASS, RNASS and TME in STAD tumors by using a scatter plot. Except DOK4 and DOK7, the other DOK family genes were negatively correlated with DNAss and RNAss. DOK2 and DOK3 were highly correlated with immune scores, with correlation coe cients of 0.91 and 0.82, P < 0.001, respectively. DOK5 and DOK6 had a higher correlation with matrix score, the correlation coe cients were 0.79 and 0.76, P < 0.001 (Fig. 7).
8.Drug sensitivity analysis in pan-cancer. We downloaded and processed the transcriptional expression of DOK family genes in NCI-60 cancer cell lines and the drug activity of 263 antitumor drugs from CellMiner database to analyze the potential in uence of DOK family genes on drug response by Pearson correlation analysis. From the results of the analysis, we can conclude that DOK2 is a gene that is sensitive to a variety of cancer drugs. DOK2 had a signi cant positive correlation with the sensitivity of nelarabine and chelerythrine, with correlation coe cients of 0.725 and 0.706, P < 0.001; DOK4 was negatively correlated with Okadaic acid, the correlation coe cient was − 0.488, P < 0.001. DOK6 was positively correlated with Estramustine, correlation coe cient was 0.547, P < 0.001 (Fig. 8). This analysis of gene expression and drug sensitivity is expected to provide new ideas for clinical treatment and subsequent experimental basic research.
9.Tumor mutation load (TMB). With the rapid development of immunotherapy, the signi cance of detecting tumor mutation load is becoming more and more important. TMB refers to the number of somatic mutations in the tumor genome after the deletion of germ line mutations, that is, the deletion of innate mutations, only looking at the number of mutations speci c to tumor cells. The higher the TMB, the more neoantigens the tumor produces, the more easily the tumor cells can be recognized by the body's immune cells, and the more effective the e cacy of immunotherapy is likely to be. Through radar chart analysis, DOK4 has the highest TMB correlation with STAD (correlation coe cient 0.28, P < 0.001, Fig. 9A). However, DOK6 shows the opposite performance, with a correlation coe cient of -0.42, P < 0.001 (Fig. 9B). The correlation analysis of DOK family genes with TMB may provide reference for tumor immunotherapy.
Detection of microsatellite instability (MSI). Microsatellites are short tandem repeats throughout the human genome. Compared with normal cells, microsatellites in tumor cells change in length due to the insertion or deletion of repeat units, leading to the occurrence and development of tumors, which is called microsatellite instability. In the current clinical treatment, microsatellite instability is closely associated with colorectal cancer, and this phenomenon is present in about 15% of colorectal cancer, so we analyzed the correlation between DOK family genes and MSI and listed the radar map most related to COAD. DOK2 was positively correlated with MSI, the correlation coe cient was 0.22, P < 0.001, while DOK4 was the most signi cant gene negatively correlated with MSI, P < 0.001, (Fig. 9C-D). Further experiments are needed to prove whether patients with COAD can bene t from the expression of these two genes.
Additional radar maps are shown in the attached picture ( gure S5-S6).
10.Based on KEGG pathway analysis, we found that DOK family genes were enriched in multiple pathways related to STAD. DOK3, DOK6 and STAD are enriched in the autophagy pathway (Fig. 10).

Discussion
Downstream of kinase (DOK) proteins, which contain negative regulators of immune cell signaling, represent a multigene family of adapters (12). In this study, we investigated in detail the association between DOK gene transcriptional expression and 33 TCGA tumor characteristics. By multidimensional analysis, we rst analyzed the differential expression of DOK gene in 10,327 tumor patients and 730 paracancerous patients, and further analyzed the association between DOK differential expression and survival in tumor patients. Then, we analyzed the association between DOK family genes and TME, MSI, TMB, immune subtypes, dryness and drug response from multiple perspectives through various databases, to more comprehensively analyze the function of DOK genes and provide new help in subsequent tumor studies.
Concordant with previous studies, DOK2 and DOK4 are low expressed in breast cancer, and lower gene expression means larger tumor size, later clinical stage, and more lymph node metastasis, which is consistent with the results of our statistical analysis (13,14). DOK7 down-regulation has also been reported to inhibit the proliferation and invasion of breast cancer through the P13K/PTEN/AKT pathway (15). Similarly, DOK7 hypermethylation may also serve as a biomarker for the diagnosis of breast cancer (16). It is worth noting that DOK2, DOK3 and DOK5 showed statistically signi cant differences among gastric cancer patients with different stages. Among them, the high expression of DOK5 predicts a shorter survival time, which may be helpful for the gene research of gastric cancer. It has also been reported in the literature that DOK6, as a connector interacting with a variety of molecules in signal transduction pathway, is an integrated biomarker for a variety of carcinogenic signals in gastric cancer (17). DOK2 is missing in about half of human lung adenocarcinoma, and DOK2 is associated with a co-deletion of DUSP4 that causes lung adenocarcinoma in mice (18). Another study found that DOK2 deletion is associated with mutations in EGFR in human lung adenocarcinoma, jointly promoting tumor development (19). In addition, DOK2 can be used as a poor prognostic marker for human glioblastoma multiforme and can inhibit tumor migration and invasion via the JAK2/STAT3 pathway (20,21). In our study, in addition to the signi cant survival correlation between DOK2 and lung adenocarcinoma, DOK1 and DOK6 were also signi cantly associated with lung adenocarcinoma.
Then, tumor samples were classi ed according to different immune subtypes of C1-C6 (9), and the expression levels of DOK1-DOK7 RNA-seq were statistically analyzed, and the results showed that these genes were signi cantly correlated with the immunotyping. As previously reported, DOK3 is an important modulator of innate immune responses in macrophages and B cells and modulates downstream signaling pathways of various immune receptors (22,23). DOK1 is a tumor suppressor that is often lost in malignant cells, but it still regulates the immunoreceptor activity of stromal cells in the tumor microenvironment and promotes the invasion of cancer cells (24,25). The proportion of immune cells and stromal cells in solid tumors constitutes the tumor microenvironment TME. By calculating immune scores and stromal scores, we can infer the purity of tumors. Especially DOK2 and DOK3 are related to the immunity of most tumors, which affect tumor treatment methods to varying degrees (26).
It has been reported that tumors acquire stem cell-like properties (27) during their development, with selfrenewal and dedifferentiated stem cell-like characteristics (28). In this study, we used OCLR method to calculate DNAss and RNAss scores in tumor samples and correlate them with DOK gene, to explore the role of DOK gene in tumor dryness. DOK gene was found to be negatively correlated with the dryness of most types of tumor cells, especially DOK5 and DOK6.
Studies have found that DOK1 has been identi ed as possibly related to cisplatin resistance (29), and DOK2 de ciency induces chemotherapy resistance by reducing the level of apoptosis in the treatment response (30). Therefore, we analyzed the correlation between DOK gene transcriptional expression level and various drug responses. As we analyzed, DOK2 was signi cantly associated with many drugs, among which Nelarabine had the strongest correlation with a correlation coe cient of 0.725, P < 0.001. Tumor mutation load and microenvironmental instability have recently become a hot topic of immunotherapy. We used radar map to analyze the microsatellite instability of colon cancer, which may provide reference for subsequent immunotherapy.
This study is a multi-dimensional study of DOK family genes in the pan-cancer, but there are still many obvious limitations. First, the samples in our study are all from public databases in the United States, so the sample size is insu cient. The model we built cannot represent the actual situation in other regions, such as Europe and Asia, and there is no other public external data to verify our model. Second, our research is based on biological information in a database and has not been validated at the molecular or animal level. Thirdly, our research on DOK gene is based on data correlation, and we have not studied its speci c molecular mechanism and action pathway in depth. Therefore, in future studies, we will further explore the speci c molecular mechanism of these genes and hope to better explore the speci c mechanism of DOK gene in tumor genesis and development combined with this study.

Conclusion
In this study, we conducted a multi-dimensional analysis of DOK1, DOK2, DOK3, DOK4, DOK5, DOK6 and DOK7. These included pan-cancer differential expression analysis, immune subtype analysis, clinical analysis, tumor purity analysis, dry correlation analysis, drug response, tumor mutation load and microenvironmental instability. DOK gene was expressed differently in different tumor types and different immune subtypes. This analysis reveals multiple expression patterns of the DOK family at the pan-cancer level and provides new clues for cancer treatment strategies.

Declarations Acknowledgement
Thanks to Youliang Wu for guiding the format modi cation and submission of the magazine.

Statement of Ethics
All analyses were based on Public database; thus, no ethical approval and patient consent are required.

Con ict of Interest Statement
The authors declare no con ict of interest.

Funding Sources
This work was supported by a grant from the National Natural Science Foundation of China (81874063).

Authors' Contributions
Xiaodong Wang collects all the article data and is responsible for writing the full text. Lifeng Xu, Yaxian Li participated in the writing of the article and the modi cation of the article format. Xin Xu was responsible for the editing of the pictures and participated in the writing of the full text. Yongxiang Li provided the ideas for the research and all the funding. All authors read and approved the nal manuscript.

Availability of data and materials
The data used to support the ndings of this study are included within the article.