Abstract
Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Leach A (2007) An introduction to chemoinformatics. Springer
Gasteiger J, Engel T (eds) (2003) Chemoinformatics: a textbook. Wiley-VCH
Gasteiger J(ed) (2003) Handbook of chemoinformatics: from data to knowledge. Wiley-VCH
Umashankar V, Gurunathan S (2011) Chemoinformatics and its applications. General applied and systems toxicology. Wiley
Acton A(ed) (2011) Issues in biotechnology and medical technology research and application (Scholarly Editions)
Muffatto M (2006) Open source: a multidisciplinary approach. Imperial College Press
Ortega JM (1994) An introduction to fortran 90 for scientific computing. Oxford University Press
http://www.computerhope.com/unix.htm. Accessed on 22 Oct 2013
Douglas EC Internetworking with TCP/IP—Principles, Protocols and Architecture
Kernighan BW, Ritchie DM (1978) The C programming language, 1st ed. Prentice Hall, Englewood Cliffs
Stroustrup B (1997) “1”. The C++ Programming Language, 3rd ed. Addison-Wesley
Fan Li (2006) Developing chemical information systems: an object oriented approach using enterprise Java. Wiley
Schatz MC, Trapnell C, Delcher AL, Varshaney A (2007) High through put sequence alignment using graphics processing units. BMC Bioinformat 8:474
Ash JE, Warr WA, Willett P (1991) Chemical structure systems: computational techniques for representation, searching, and process of structural information. Ellis Horwood, New York
Gluck DJ (1964) A chemical structure storage and search systems developed at Du Pont. J Chem Informat Model 5:43–51
Warr WA (2011) Representation of chemical structures. WIREs Comput Mol Sci 1(4):557–579
Krause S, Willighagen E, Steinbeck C (2000) Using the collaborative forces of the internet to develop a free editor for 2D chemical structures. Mol 5:93–98
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann EE, Willighagen E (2003) The chemistry development kit(CDK): an open source JAVA library for Chemo-and Bioinformatics. J Chem Informat Model 43:493–500
Ertl P (2010) Molecular structure input on the web. J Cheminformatics 2:1
Bienfait B, Ertl, P (2013) JSME: a free molecule editor in JavaScript. J Cheminformat 5:24
http://www.molinspiration.com/. Accessed on 22 Oct 2013
http://www.chemaxon.com/. Accessed on 22 Oct 2013
http://www.acdlabs.com/resources/freeware/chemsketch/. Accessed on 22 Oct 2013
http://www.cambridgesoft.com/Ensemble_for_Chemistry/ChemOffice/. Accessed on 22 Oct 2013
http://www.schrodinger.com/. Accessed on 22 Oct 2013
http://www.chemcomp.com/. Accessed on 22 Oct 2013
http://accelrys.com/products/informatics/cheminformatics/draw/ . Accessed on 22 Oct 2013
https://www.cas.org/products/scifinder. Accessed on 22 Oct 2013
http://www.chemspider.com/. Accessed on 22 Oct 2013
http://www.nih.gov/. Accessed on 22 Oct. 2013
http://www.beilstein-journals.org/bjoc/home/home.htm. Accessed on 22 Oct 2013
Sorter PF, Granito CE, Gilmer JC, Alan G, Metcalf EA (1963) Rapid structure searches via permutated chemical line notation. J Chem Doc 4(1):56–60
Fritts LE, Schwind MM (1982) Using the Wiswesser line Notation (WLN) for online, interactive searching of chemical structures. J Chem Inf Comput Sci 22:106–109
Dalby A, Nourse JG, Hounshell WD, Gushurst AKI, Grier DL, Leland B A, Laufer J (1992) Description of several chemical structure file formats used by computer programs developed at molecular design limited. J Chem Informat Model 32(3):244
Weininger D (1990) SMILES Graphical depiction of chemical structures J Chem Inf Comput Sci 30:237–243
Cline AS, Homer MA, Hurst RW, Smith T, Gregory B (1997) SYBYL Line Notation (SLN): a versatile language for chemical structure representation. J Chem Inf Comput. Sci 37:71–79
Alan M (2006) The IUPAC international chemical identifier: In Chl. Chemistry International (IUPAC) 28 (6) http://www.iupac.org/publications/ci/2006/2806/4_tools.html.
King RB (ed) (1983) Chemical applications of topology and graph theory. Elsevier
Grave K D, Costa F (2010) Molecular graph augmentation with rings and functional groups. J Chem Inf Model 50:1660–1668
Santagata LN, Suvire FD, Enriz RD (2001) A matrix representation for the geometrical algorithm to search the chemical space. J Mol Struct Theochem 571:91–98
http://www.ccl.net/cca/documents/molecular-modeling/node3.html
http://accelrys.com/products/informatics/cheminformatics/ctfile-formats/no-fee.php
Phadungsukanan W, Kraft M, Townsend JA, Murray-Rust P (2012) The semantics of chemical markup language(CML) for computational chemistry. J Cheminform 4(1):15
http://www.tripos.com/tripos_resources/fileroot/pdfs/mol2_format.pdf
Barnard JM, Lynch MF, Welford S M (1981) Computer storage and retrieval of generic chemical structures in patents. GENSAL, a formal language for the description of generic chemical structures. J Chem Inf Comput Sci 21:151–161
O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3:33
http://www.chemaxon.com/marvin/help/applications/molconvert.html
Bath, PAP, Andrew R, Willett P, Allen, FH (1994) Similarity searching in files of three-dimensional chemical structures: comparison of fragment-based measures of shape similarity. J Chem Inf Comput Sci 34:141–147
Wang Y, Bajorath J (2010) Advanced Fingerprint methods for similarity searching: balancing molecular complexity effects. Comb Chem High Throughput Screen 13:220–228
Wipke W T, Krishnan S, Ouchi G I (1978) Hash functions for rapid storage and retrieval of chemical structures. J Chem Inf Comput Sci 18:32–37
Takahashi Y, Sukekawa M, Sasaki S (1992) Automatic identification of molecular similarity using reduced-graph representation of chemical structure. J Chem Inf Comput Sci 32:639–43
http://www2.chemie.uni-erlangen.de/software/wodca/subsearch.html
Vogt M, Bajorath J (2013) Similarity searching for potent compounds using feature selection. J Chem Inf Model 53(7):1613–1619
Sayle RA, Batista JJ, Grant A (2013) An efficient maximum common subgraph(MCS) searching of large chemical databases. J Cheminformat 5(1):O15
Chen X, Reynolds CH (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. J Chem Inf Comput Sci 42:1407–1414
Holliday JD, Salim N, Whittle M, Willett P (2003) Analysis and display of the size dependence of chemical similarity coefficients. J Chem Inf Comput Sci 43:819–828
Weiss G (2007) Exploring the milky way of molecular diversity combinatorial chemistry and molecular diversity. Curr opin chem biolo 11:241–243
Karthikeyan M, Vyas R (2012) Chemical structure representation and applications in computational toxicology. In: Reisfield B, Mayeno AN (ed) Computational toxicology. Springer, pp 167–192
Karthikeyan M, Uzagare D, Krishnan S (2003) Compressed chemical markup language for compact storage and inventory applications. 225th ACS Meeting New Orleans. CG ACS, pp 23–27
Karthikeyan M, Krishnan S, Pandey AK (2006) Harvesting chemical information from the internet using a distributed approach. Chem Extreme J Chem Inf Model 46:452–461
Karthikeyan M, Bender, A (2005) Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes. J Chem Inform Model 45:572–580
Valko AT, Johnson AP (2009) CLiDE Pro: the latest generation of CLiDE, a tool for optical chemical structure recognition. J Chem Inform Model 49:780–787
Filippov IV, Nicklaus MC (2009) Optical structure recognition software to recover chemical information OSRA, an open source solution. J Chem Inf Model 49(3):740–743
Karthikeyan M, Krishnan S, Pandey AK, Bender A (2008) Distributed chemical computing using Chemstar: an open source Java Remote Method Invocation architecture applied to large scale molecular data from Pubchem. J Chem Info Model 48:691–703
Song CM, Bernardo PH, Chai CL, Tong JC (2009) CLEVER: pipeline for designing insilico chemical libraries. J Mol Graph Model 27(5):578–583
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min Knowl Discov 2:283–304
Hoon MJL, Imoto S, Nolan J, Miyano S (2004) Open source clustering software. Bioinforma 20(9):1453–1454
Saldanha AJ (2004) JAVA treeview extensible visualization of microarray data. Bioinforma 20:3246–3248
Ullman J (1997) First course in database systems. Prentice-Hall Inc., Simon & Schuster, p 1
Mike C SQL Fundamentals
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer India
About this chapter
Cite this chapter
Karthikeyan, M., Vyas, R. (2014). Open-Source Tools, Techniques, and Data in Chemoinformatics. In: Practical Chemoinformatics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1780-0_1
Download citation
DOI: https://doi.org/10.1007/978-81-322-1780-0_1
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1779-4
Online ISBN: 978-81-322-1780-0
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)