PROTEOMIC ANALYSIS OF ERBB2 - A POTENTIAL BREAST CANCER MARKER: AN INTEGRATED BIOINFORMATICS STRATEGY

1. Department of Medical Laboratory, College of Applied Medical Sciences, Qassim University-KSA. 2. Department of Biochemistry, Jinnah University for Women, Karachi-Pakistan. 3. Department of Biochemistry, College of Medicine, Qassim University-KSA. 4. Department of Biochemistry,University of Karachi, Karachi-Pakistan. ...................................................................................................................... Manuscript Info Abstract ......................... ........................................................................ Manuscript History Received: 20 April 2020 Final Accepted: 25 May 2020 Published: June 2020

The high prevalence of breast cancer mortality has recently been reported among Saudi Arabian women. The incidence of morbidity is increased 10 -fold in the past few years with the reported prevalence of 21.8%,having>9.48%in the Qassim region [1][2]. It is thesecond most prevalent and leading cause of death in women and the fifth reported cancer deaths worldwide due to late diagnoses and poor treatments [3][4][5].
The advancements in genomic technologies strongly boost up the research studies. Insilico approach is used in the present study, to uncover the molecular basis of breast cancer-associated -the "ERBB2" gene.This is a membrane tyrosine kinase gene that is overexpressed and gene amplified in about 20% characterized by prompt tumor growth, increased disease progression with decreased survival rate.Recently been a useful treatment target for breast cancer, but predictive of poor clinical outcome [6].
ERBB-2 is a receptor tyrosine-protein kinase also recognized as TKR1, HER-2, CD340, NEU, HER-2/neu, MLN 19, and NGL. Gene ID: 2064 -the gene code depicts that this gene belongs to the epidermal growth factor (EGF) family. Playa significant role in cell growth, production, existence, and motility, any mutation of ERBB2 will cause cancers and the over-expression will cause activation of downstream signaling. The underlying mechanism involved in ERBB2 oncogenic activity is a complex signaling network, as both homo/heterodimers upon ligand binding tightly regulate migration, metastasis, and invasion of the malignant cell [7][8]. Humanized monoclonal antibodies and receptor tyrosine kinase inhibitors are the recently proposed anti-ERBB2 therapies for breast cancer treatmentwhose overexpression alters cell proliferation and survival [9][10].
The effectiveness of protein-protein interactions in determining gene-disease associations, usually proposed by computational approaches, reveals two sources of information either to predict associated genes with diseases and linkage intervals/physical interaction networks. Different techniques uncover the associations between gene and disease taking an integrative approach such as; gene expression, biological pathways, protein sequences, and numerous phenotypic traits of diseases. The assorted mechanism of post-translational modifications that modulate protein's cellular function is an additional target of the current study, critically involved in the regulation of cell signaling at a molecular level [11][12][13][14].
The present study is designed to explore the association of the ERBB2 gene with breast cancer by using advanced insilico analysis. The preceding bioinformatics tools such as SMART, KEGG pathway, STRING, Cytoscape, LENS, NEXT/UNIPROT, and PTM code 2 databases were used to find the protein-protein interactions, their metabolic pathways, and related post-translational modifications of the ERBB2 gene.

Methods:-
The present study depicts an insilicoapproach to evaluate an in-depth analysis of ERBB2-a potential marker for breast cancer. The amino acid sequence of ERBB2 was retrieved from the genome database at NCBI. The complete protein data and reference proteome sets were determined using the Uniprot/ Swissprot/Pfam/SMART database. The biological associations of differentially expressed protein and their functional interactions (knowledge base) were identified by STRING 8.3 (http://string-db.org/) tool. The identified interactions were then subjected to Cytoscape software (www.cytoscape.org/) for visualization. Out of so many plugin analysis tools available in Cytoscape, CytoHubba is used for further analysis of the degree of top 10-protein node and genes, hub genes, and also the Molecular Complex Detection-MCODE, another plugin was used to identify the modules of the PPI network using the standard criteria.
To identify associated pathways -the KEGG-database is used (http://www.genome.jp/kegg/) along with SMARTthe Simple Modular Architecture Research Tool (https://smart.embl.de/) that predicts molecular pathways with their associated biological functions.

Results:-
The ERBB-2 gene sequence was obtained directly from the NCBI-(National Center for Biotechnology Information) database. ERBB2 receptor tyrosine-protein kinase protein sequence (FASTA) consists of 1255 amino acids and 3768 nucleosides (Fig.1a RefSeq)>hsa:2064 K05083 [EC:2.7.10.1] while the chromosomal location of the ERBB2gene has been actuated by the gene card on chromosome 17 at position 17q12 (Fig.1b). Sequence Alignments (Homo sapiens-P04626) of ERBB2 was obtained by the Swiss Model Repository (Fig.2) along with the ERBB2 predicted Protein Structure.
The predicted ERBB2 post-translational modifications (Fig.5) were used for the prediction of the S-nitrosylation site, demonstrating three nitrosylation sites in the given gene while the prediction of phosphorylation sites depicts the highest count. The potential phosphate alteration represented in the graph as vertical lines are the phosphorylated Ser residues (Red); the phosphorylated Thr residues (Green); the phosphorylated Tyr residues (Blue). The pink flat line is the threshold for alteration potential. The system-wide quantitative proteomic study of breast cancer tumors (Fig.6a) issuggestive of the association of ERBB2 and single-nucleotide variations (SNVs) in breast cancer (BioMuta) [17]. Exploring further the cancer type vs. nsSNV frequency plot, suggest the highest frequency of variation of ERBB2 gene mutation in breast cancer (Fig.6b-c). The integrated BioMuta dataset indicates the frequency of nsSNVs for ERBB2 on the y-axis and the type of cancers on the x-axis.
The overlapping network of all canonical pathways implicated in ERBB2 signalingrepresenting interconnection between altered canonical pathways obtained byCytoscape (WP673-ERBB Signaling Pathway-Homosapiens) highlighting the involvement of signaling pathway in cancer by overexpression of ERBB2 (Fig.7).
Protein Interactome reveals converging molecular pathways of Breast Cancer Gene-ERBB2. The PPI network of ERBB2 created by the Cytoscape for visualization (Fig.8a). 32 nodes and 66 edges by Cytohubbe, also correspond to the interaction of top 10 functional protein Hub nodes of ERBB2.The degree of connectivity between top 10 genes in the PPI network appears in red-orange includes KRAS, HRAS, GRB2, ERBB2, SHC1, PIK3R2. PTPN11, GRB7, EGF and PIK3CG. While the Protein-Protein Interaction Network via STRING predicted the ten functional protein partners that interact with the ERBB2comprising EGF, GRB2, PTPN11, HSP90AA1, SHC1, NRG1, GRB7, HRAS, KRAS, and BTC (Fig.8b).
The specified functions of the predicted protein partners include; EGF (Pro-epidermal growth factor) known to be involved in stimulating different epithelial and epidermal tissue growth. GRB2 (Growth factor receptor B2) that functions as an adapter protein between cell surface receptors and signaling pathways. PTPN11 (Tyrosine-protein phosphatase non-receptor type 11) a downstream regulator of signal transduction tyrosine kinases receptors dephosphorylation. HSP90AA1(Heat shock protein 90-alpha A1) a molecular chaperone involved in the regulation of signal transduction and control different cell cycle stages of specific target proteins. SHC1 (transforming protein 1) a signaling adapter that links signaling pathways and growth factor receptors activation. NRG1(Pro neuregulin-1) a membrane-bound direct tyrosine kinase receptors ligand for ERBB3-4 and co-receptor ERBB1-2 causing receptor activation via tyrosine phosphorylation. GRB7(Growth factor receptor-bound protein 7) interacts with several receptor kinases to regulate downstream signaling via the cytoplasmic domain. HRAS-(GTPase-HRas) involved in signal transduction by Ras protein activation. KRAS (GTPaseKRas) a Ras protein having GTPase activity primarily 1454 regulate cell proliferation and also involved in stimulating oncogenic events. BTC (Probetacellulin) is an established growth factor and a potent mitogen for various epithelial/ vascular cells.
Cytoscape-Network module of PP1 merged ERBB2 pathways (Fig.9) illustrate the Strings-Breast cancer PPI network having all the available functional proteins involved in the disease (a) along with the substantial module (b) displays a total of 7nodes and 15 edges in the ERBB2 construct Protein-Protein Interaction network by Cytoscape-Mcode analysis., also show subnetwork and close functional protein partners. While the Interactome of the merge ERBB2 pathways: WP673-ERBB2 Signaling & WP-4262-Breast Cancer Pathway (c) using Cytoscape-Merge tool analysis, to find the Interactome between breast cancer and the ERBB2.

Discussion:-
The current study evaluated the integration of advanced computational technology with the cancer biomarkers particularly the ERBB2 gene and its involvement with breast cancer. As a high prevalence of breast cancer mortality among Saudi Arabian women confirmed by the descriptive analysis [1], this approach may help to establish novel breast cancer markers and identify potential therapeutic targets, essentially needed for the early diagnosis and treatment of breast cancer.
The prediction of individual gene function made easier by using computational analysis that aids in finding cellular interaction networks, and their associated cellular physiological mechanisms. The activation of ERBB2 associated downstream signaling pathways [7,9], will help in understanding the molecular mechanisms leading to breast cancer. Genomic location, function, catalytic activity, Protein structure and domains, interactive functional partner's, and post-translational modifications of the ERBB2 gene have been accumulated by Bioinformatics data analysis.
ERBB2 is recognized as an oncoprotein associated with the family of EGFR-epidermal growth factor receptor, amplification, or protein overexpression has been a recognized mechanism involved in breast cancer. Approximately 20% of breast cancers invasion and metastasis reported correlated with poor patient survival [18][19]. The physiological implications of ERBB2 include the regulation of molecular and genetic events during embryonic development and in adult tissue maintenance. The underlying mechanism of cell adhesion and migration in ERBB2positive tumor invasion and metastasis are still obscure despite enormous investigations. Deregulated ERBB2 pathways help in the development of ERBB2-targeted therapies in breast cancer pathogenesis [20].
The protein-based approach may directly reflect cellular functions and functional networks involved in breast cancer rather than only genes and transcripts. Moderate affinity and promiscuity of recognition still a proven challenge for targeting Protein-Protein Interactions (PPIs), despite the enormous research in the discovery of anticancer drugs using ERBB2/p130C. However, resistance to ERBB2-targeted treatments was reported due to the contribution of various factors [7,[9][10].
One important factor is phosphorylation, which is vital for protein-protein interactions and initiation of the signaling pathways as evident from the cytoplasmic domain of ERBB2 having numerous phosphorylation sites particularly 1459 tyrosine phosphorylation. The homo/heterodimerization binding with the EGFR family leads to the activation of different ERBB2 pathways including downstream signaling pathways known to be involved in overexpression/amplification of the gene. The SMART diagram represents a summary of the ERBB2 protein domains and different post-translational modifications (Fig.3), particularly the predicted hyperphosphorylation (Fig.5). Suggesting that this ERBB2 gene contains a protein receptor tyrosine kinase that regulates outgrowth and stabilization of peripheral microtubules (MTs) by binding with its potential ligand -GP30,thus the associated signaling pathway elicitedphosphorylation, causing inhibition of GSK3B at the cell membrane to inhibit the phosphorylation of antigen-presenting cells (ACP) [7,21].
SMART-Domain Architecture Analysis of ERBB2displaysseven different domains particularly the Tyrkc-tyrosine kinase-1 domain, any mutationin this region is associated with different types of cancer. The system-wide proteomic study of breast cancer tumorssuggeststhe highest frequency of variation of ERBB2 gene mutation and singlenucleotide variations (SNVs) in breast cancer. The oncogenic activity depends on several types of somatic mutations in the ERBB2 kinase domain mainly encourage the phosphorylation of cellular signaling proteins in different types of cancers particularly in breast cancer, as these mutations are more proficient to prevent the effect of apoptosis or have altered catalytic activity [10,22]. These oncogenic insertions activate mutations that may cause auto-inhibition by conformational alteration that subsequently alters the Adenosine triphosphate (ATP)-binding cleft to enhanced kinase activity by increasing ATP binding attraction and turnover number thus contributes in the subsequent phosphorylation events.
The molecular subtypes of breast cancer indicate that specific hormone receptors such as estrogen and progesterone subtypes and ERBB2gene manifestations subsequently lead to breast cancer [23]. According to the reported research, greater than 10% frequency of all breast cancers have somatic alterations only in three of the genes TP53, PIK3CA, and GATA3 however, ERBB2 activates the PI3K/AKT and the RAS/RAF/MAPK pathways and induce cell developmentand survival [8,[24][25][26].
ERBB2 activating mutations particularly in the tyrosine kinase region reported in the preclinical data suggest the response of HER2 tyrosine kinase inhibitors in several tumor types.However, resistance to ERBB2-targeted treatment is, unfortunately, becoming a major limitation [27][28]. Further clinical investigation is needed for an enhanced understanding of the mechanisms of ERBB2-mutational driven therapy as new therapeutic targets to improve cancer patient survival and care.

Conclusion:-
Insilico approach in the current study expressed distinctive opportunities to enhance understanding of the biological function of ERBB2-a potential marker for breast cancer. The current approach of molecular networking and protein interaction of ERBB2 in breast cancer has revolutionized the importance not only of the proteins themselves but of their inter-relationships. Breast cancer is mostly caused by the ERBB2gene mutations with the prevalence of 20% breast cancer in females, these gene-specific tyrosine kinase domain mutations activate the oncogenic activity of the gene resulting in phosphorylation of cell signaling protein thus affecting resistance to programmed cell death (apoptosis).
In conclusion, the evaluation of the existing cancer candidate gene and the interacting proteins and pathways particularly ERBB2-downstream signaling pathway has the potential to generate novel hypotheses in oncology.Thusprovide a baseline to identify target protein-based pathways involved through wet experiments and can in future be investigated in more detail which can reveal new targets for treatment.