Homo Sapiens: Epidermal Growth Factor Receptor (DNA Data Mining)

Epidermal growth factor plays an important role in the Regulation of the cell growth, proliferation and differentiation by binding to its receptor EGFR. IT has a receptor called EGFR it leads to the rapid internalization and releases the Lysozyme. It reduces the cell signalling, but it is possible only after the endocytosis only. Epidermal growth factor has the capability to transfer the epithelial cell. During the carcinomas in human epidermal growth factor and its ligand show high expression in that condition


Introduction
NCBI: NCBI means national centre for biotechnology information (NCBI). NCBI is the part in the United States of America. The National centre for Biotechnology information has advances the science and health by providing the access to biomedical and genetic information. Research in molecular and genetic process that control the diseases and health are done by National Library of Medicine (NLM). With the help of NCBI researchers select a tool that helps in finding the sequences. An important aspect that is science primer which helps in easy reading of introduction to many science topics like Bio-informatics, Molecular modelling, Genomic mapping and Molecular genetics. NCBI plays an important role in the sequencing of genome, protein structures and gap between sequences act. Sequencing of genome, gap between sequences and structural information stored in PUBMED is growing rapidly. Other than sequencing structural determination takes a lot of time and limited in its application [1].

Bioinformatics
The term Bioinformatics was first introduced in early 1990s because of Human Genome Project. Bioinformatics is the application of biology and information technology in the field of molecular biology. Due to the advances in the resources and technology Human Genome Project has impact on current research that is taking place in the world. Some of the important applications of Bioinformatics are

Databases
In the bioinformatics two types of databases are important they are Protein databases

DNA databases
These two data bases analyse the biological databases and then format by functional and sequence information. Based on sequence analysis biological databases are of two types Primary sequence database Secondary sequence database Re-engineering works are doing for making the biological data more easy and efficient to use. By this it is easy to get data from the different sources and increase the power and capability of biological resources [2]. Relational database management system plays an important role in the implementing of molecular biological sequences by using the ODBC and JDBC for data exchange. By using these methods many problems get disappeared [3].

Blast
BLAST means Basic Local Alignment Search Tool. Between the sequences the Blast find the region of similarities. It compares the sequence databases to protein or nucleotide sequence and calculates the matches. Blast also can be used to identify the phylogenetic Relation between the sequence and in gene families. Blast programme are mainly used to searching Protein and DNA database sequence similarities [4]. Same as in the Human Genome Project all the genes have been mapped and sequenced and the information is stored in gene bank. This information will be available by an accession number. Accession number contains the information regards an individual gene. The accession number BC037558 contains the information about the Homo Sapiens gene that is epidermal growth factor receptor pathway substrate 15

Accession Number
Accession number is the unique number that is given to an individual for finding the nucleotide sequence in Bioinformatics. With the help of the accession number has to identify the DNA and protein sequence and data is to be recorded [6].

Epidermal Growth Factor
Epidermal growth factor plays an important role in the Regulation of the cell growth, proliferation and differentiation by binding to its receptor EGFR. IT has a receptor called EGFR it leads to the rapid internalization and releases the Lysozyme. It reduces the cell signalling, but it is possible only after the endocytosis only [7]. Epidermal growth factor has the capability to transfer the epithelial cell. During tha carcinomas in human epidermal growth factor and its ligand show high expression in that condition [8].

Gene Structure
The gene is present in the chromosome 19. In the chromosome 19 it is present in the region of 19.13.11. EPS15L1 is the protein that encodes the gene (Figures 1 and 2) [9].

Open Reading Frames (ORF)
In the DNA sequence ORF are used to identify the protein coding in DNA sequence. For example in the DNA sequence with equal percentage of each nucleotide a stop codon is expected and in gene prediction for prokaryotes look for stop codon fallowed by Open Reading Frame. In translation it is important to know that which nucleotide starts translation and when stops these are called open reading frame.  The above given is ORF frames and its length. From these ORF structure known that 25 open reading frames contain several ORFs [11,12].

Introns and Exons Introns
Introns are the sequence that is present with in the gene. An intron is a nucleotide sequence that is located in the middle of the gene sequences.

Exons
Exons are the sequences that are present in DNA and are used to code amino acids in the protein. Exons are placed in to the mRNA to code for the amino acids.
ENSEMBL is the site that is used to find the information about the introns and exons with in a gene.  There are 864 amino acids in the translation length of the transcript ID ENST0000248070 it has CCDS consensus sequence which is shown above. The variation in the colour shows the different in the regions [13].

Multiple Sequence Alignment
Multiple sequence alignment is done when it is need to compare homologous sequence it is important tool in bio informatics. For doing multiple sequences various programme are used like CLUSTAL W and T-coffee they will show various results of the alignments for the sequences [14].
The Protein FASTA sequence of my accession number BC037558 is taken from the BLAST results the sequence is copied and saved. Along with my accession number other four sequences are taken from four different accession numbers and saved. For doing multiple sequence alignment we have used the website http://www.ebi.ac.uk in these website we will find the CLUSTAL W2 click on that and upload the sequences we will get the results in few minutes. The results will appeared as follows (Figure 7) [15]. Fallowed by these we can watch the cladogramor Phylogenetic tree which gives the results of evolutionary distance between the five organisms.

Phylogenetics
Phylogenetics is process of placing the organisms in to groups or classes based on their similarity in evolutionary relationships. In case of molecular genetics the classification is based on the comparison on DNA or Amino acid sequences ( Figure 9). From the above tree we can see the similarity difference from these it is clear that query sequences has good similarity in between them. It can be explained as evolution and these are due to the mutations between the genes [17].

Conclusion
Epidermal growth factor plays an important role in the Regulation of the cell growth, proliferation and differentiation by binding to its receptor EGFR. EPS15L1 is the protein that encodes the gene.
Phylogenetic tree gives the results of evolutionary distance between the five organisms. From the phylogenetic tree we can see the similarity difference based on that it is clear that query sequences have good similarity in between them. It can be explained as evolution and these are due to the mutations between the genes.