Web Resources on TB: Information, Research, and Data Analysis

Since its creation in the middle of the 20th century, the Internet has become the universal language of the digital world. All the capabilities it offers, such as electronic mail systems, information distribution, file sharing, multimedia streaming services and online social networking, have already been of service to billions of people around the world. In fact, if the Internet were to disappear tomorrow, most people would struggle to manage their lives without it.


Introduction
Since its creation in the middle of the 20 th century, the Internet has become the universal language of the digital world.All the capabilities it offers, such as electronic mail systems, information distribution, file sharing, multimedia streaming services and online social networking, have already been of service to billions of people around the world.In fact, if the Internet were to disappear tomorrow, most people would struggle to manage their lives without it.
By providing millions of people with information that is constantly updated (24 hours a day, seven days a week) Internet has become the second source of information for the whole world, television still being the first one in most countries.It has also provided a unique way of communication, where a person in an isolated geographical location can instantly be in touch with thousands and maybe millions of other individuals around the world.
Scientists were among the first ones to explore all these capabilities.Now, we talk about data mining, terabytes and petabytes, algorithms -terms related to what we call "Big Data", the large volume of information generated by a variety of new technologies, ranging from Astronomy (telescope data), the Internet itself (more and more Facebook users every day) to Biology (cheaper and more efficient DNA sequencing technologies), among other areas of study and research.Some technologies and experiments, like the Large Hadron Collider at CERN, Switzerland (perhaps the most important scientific tool ever built), produce an incredible volume of information, on the order of terabytes per second.
Databases containing DNA and protein sequences were created; institutions around the world developed websites to expose their work to the world; scientific magazines started their online versions.The world is connected as never before.This connection transcend the virtual realm of the Internet: today, it is possible to travel from one side of the world to the other in just one day.Unfortunately, this has presented us a negative side: infectious agents may also cross the world in just about the same time.
Tuberculosis is a global disease, with an estimated one-third of all people in the world contaminated by the bacillus, Mycobacterium tuberculosis.Although treatable, the large period of treatment (many abandon the therapy as soon as they feel better) together with the indiscriminate use of antibiotics is causing the spread of new, drug-resistant strains.Actually, as those familiar with epidemiology have already noticed, that is a remarkable similarity between the patterns of an epidemic or outbreak with the spread of a new piece of information throughout the internet.
However, there has also been a revolution in other areas: new high-throughput technologies, like genomics, transcriptomics and proteomics, offer a new, more integrated view of the metabolism and genetics of the organism studied, and of course M. tuberculosis was among the first to have its genome sequenced.Today, more than 30 different strains have been sequenced, as well as other organisms from the Mycobacterium genus.By comparing the genomes of virulent and non-virulent strains of TB, scientists may pinpoint particular genes and/or polymorphisms involved in this process; by examining transcriptome data, researchers may have an idea of the effects of a given drug in the bacillus' metabolism.
The purpose of this chapter is by no means to offer an exhaustive list of all the resources available on the Internet about TB, the topic of this book.This would be a massive and perhaps futile work, since the evolution of the internet occurs at a very fast pace.Rather, this chapter concentrates on a selection of the most important, relevant and stable websites with relevance to several aspects of TB, such as research, treatment, main Institutions, funding, and specialized platforms.We think this should complement all the other information already presented in this book, offering the reader a more integrated view of the disease, and also access to new platforms and systems specialized in the analysis of data generated by a series of new technologies such as DNA sequencing.

Tuberculosis facts information and treatment research
Most of the selected sites presented in this section have information about several aspects of TB, like history, epidemiology, transmission and pathogenesis, diagnosis, treatment, infection control, besides offering other services such as courses, guidelines, fact sheets and links to related sites.We have chosen an alphabetical classification to avoid conveying a false impression of importance to some sites in detriment of others.In fact, we think that every effort is worthy in this global battle against this terrible disease.

Centers for Disease Control and Prevention. The mission of the Division of Tuberculosis
Elimination (DTBE) is to promote health and quality of life by preventing, controlling, and eventually eliminating tuberculosis from the United States, and by collaborating with other countries and international partners in controlling global tuberculosis.URL: <http:// www.cdc.gov/tb/>Global Tuberculosis Institute.Located at the New Jersey Medical School, the institute provides expertise in program development, education, training and research to ministers of health, national TB programs and healthcare providers around the globe.URL: <http:// Pan American Health Organization (PAHO).Serving as the regional office for WHO, PAHO has been working for more than one century to improve health and the living standards of the countries of the Americas, being recognized as part of the United Nations' system.URL: <http://new.paho.org/hq/>StopTB Partnership.The StopTb Partnership operates through a secretariat hosted by the World Health Organization (WHO) in Geneva, Switzerland, and seven working groups whose role is to accelerate progress on access to TB diagnosis and treatment, research and development for new TB diagnostics, drugs and vaccines, and tackling drug resistant-and HIVassociated TB.URL: <http://www.stoptb.org/>Tb Alliance.Established in the year 2000, its main objective is to discover and develop better, faster-acting, and affordable drugs to fight tuberculosis.Today, the organization and its partners manage a portfolio of new anti-Tb drugs.URL: <http://www.tballiance.org>World Health Organization (WHO).Created in 1948, WHO is the directing and coordinating authority in international health within the United Nations' system, composed of 193 countries and two associate members.It supports and promotes health research in several areas, Tb being one of them.URL: <http://who.int/topics/tuberculosis/en/>

Tuberculosis databases and platforms
Since the emergence of Bioinformatics and Computational Biology back in the 1960's, numerous databases and computational tools have been created in order to provide the scientific community the necessary means to access and interpret a range of biological data.
Actually, the contribution of these disciplines became particularly evident in the 1990's and 2000's, when the development of supercomputers, powerful personal computers, and computer networks at global scale, as well as of high-throughput technologies, collectively referred as omics -e.g., genomics, transcriptomics, and proteomics -, revolutionized the field of Biology.
Nowadays, a number of web resources are publicly available aiming to organize, integrate, and provide efficient access to the ever-increasing amount of biological information produced over decades of research, particularly in recent years, with numerous projects applying the aforementioned high-throughput technologies worldwide.Accordingly, diverse options to visualize, search, retrieve and analyze this wealth of data are offered, providing the opportunity to acquire more detailed knowledge about genomes and their respective organisms, among many others opportunities.
However, the creation and maintenance of such web resources is a challenge by itself, not only because they usually have to deal with large amounts of data, but mostly because they require the designing of schemes and frameworks that accurately represent the complexity of biological systems, which is frequently a hard task to be accomplished.Another difficulty is the development of efficient data retrieval systems, implemented in user-friendly interfaces and intended for complex and massive database searching.It is worth noting that, in many circumstances, the authors and curators of such resources receive little or no remuneration for their productive efforts, and the access to financial support for creation and maintenance of biological databases is still a difficult task.
In this section we present the main web resources fully or partially dedicated to mycobacterial species with relevance for readers interested in TB.Each database or platform, categorized according to its purpose and functionality, is quickly reviewed, and references to the original paper describing it, as well as its electronic site, are provided, serving as a guideline for researches or students working on TB.Notably, the computational resources presented here are all publicly available as online services and can potentially be applied to the identification of new drug targets, vaccine antigens, or diagnostics for TB, among many others applications.

Generic and multifunctional
MyBASE.The Mycobacterial Database [1] is an integrated platform for functional and evolutionary genomic study of the genus Mycobacterium, comprising extensive literature review and data annotation on mycobacterial genome polymorphism, virulence factors, and essential genes.URL: <http://mybase.psych.ac.cn/>TBDB.The TB Database [2] provides a comprehensive genomic data repository for M. tuberculosis and related bacteria, combining (in silico) genome sequence and annotation data and (experimental) gene-expression data.It also provides an analysis platform with suitable computational tools to assist (comparative) genomic and gene-expression studies of these microorganisms.Annotated features of genes and genomes, predicted orthologous groups, operons and synteny blocks, as well as predicted and curated immunological epitopes and gene-expression patterns are available.URL: <http://www.tbdb.org/>The MycoBrowser portal.The Mycobacterial Browser portal [3] is an extensive genomic and proteomic data repository for four related mycobacteria: M. tuberculosis H37Rv, M. leprae TN, M. marinum M, and M. smegmatis MC2.The system provides in silico generated and manually reviewed information on the complete genome sequence of these organisms.As part of this portal, the TubercuList database [4] integrates a range of information on the M. tuberculosis genome, such as genomic and protein annotations and features, drug and transcriptome data, mutant and operon annotation, and comparative genomics.It represents a complete redesign of the database with the same name provided by the GenoList genome browser (also described in this chapter).URL: <http://mycobrowser.epfl.ch/>

Genomic mapping and data mining
TubercuList, BoviList, BCGList.The GenoList [5] is a collection of databases dedicated to microbial genome analysis, providing a complete data set of protein and nucleotide sequences for selected species, as well as annotation and functional classification of these sequences.The TubercuList, BoviList, and BCGList databases are devoted to collect and integrate various aspects of the genomic information of M. tuberculosis H37Rv, M. bovis AF2122/97, and M. bovis BCG Pasteur 1173P2, respectively.URL: <http://genolist.pasteur.fr/>TBrowse.The TBrowse [6] is a genomic data resource, based on the Generic Model Organism Database (a collection of open source computational tools for creating and managing genomescale biological databases); the browse provides the scientific community an integrative genomic map of M. tuberculosis with millions of data-points representing different genomic features and computational predictions systematically collected from online resources and publications, including gene/operon predictions, orthologs, gene expression data, non-coding RNA, pathway/networks, regulatory elements, variation and repeats, subcellular localization, among others.URL: <http://tbrowse.osdd.net>

Comparative genomics
GenoMycDB.The GenoMycDB [7] is a relational database for large-scale comparative analysis of completely sequenced mycobacterial genomes based on their predicted protein content.Currently, the database comprises six mycobacteria -M.tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp.paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155 -providing for each of their encoded protein sequences the predicted subcellular localization, the assigned cluster of orthologous groups (COGs), features of the corresponding gene, and links to several important databases; in addition, pairs or groups of homologs between selected species/strains can be dynamically inferred based on user-defined criteria.URL: <http://www.dbbm.fiocruz.br/GenoMycDB.html>MycoDB.The xBASE [8] is another collection of databases, this one dedicated to bacterial comparative genome analyses.It provides precomputed data of comparative genome analyses among selected bacterial genera, as well as inferred orthologous groups and functional annotations.It also provides precomputed analyses of codon usage, base composition, codon adaptation index (CAI), hydropathy, and aromaticity of the protein coding sequences in these bacteria.As part of this multi-microbial system, the MycoDB currently comprises comparative data from 61 completely sequenced or unfinished mycobacterial genomes, including 40  Among the comparative data provided by this TB resource we can cite: inferred families of orthologous genes, genomic two-dimensional dot plot matrices, comparative genome mapping and browsing, and several comparative gene annotations and features.URL: <http://www.broadinstitute.org/annotation/genome/mycobacterium_tuberculosis_spp/MultiHome.html>

Genetic diversity and epidemiology
MGDD.The Mycobacterial Genome Divergence Database [9] comprises a data repository of genetic variations among different organisms belonging to the M. tuberculosis complex.The MGDD system provides quick searches for precomputed single nucleotide polymorphisms (SNPs), insertions/deletions, repeat expansions, and divergent sequences (inversions, duplications, and changes in synteny) in genomic regions of fully sequenced M. tuberculosis complex species and strains genomes.URL: <http://mirna.jnu.ac.in/mgdd/>MIRU-VNTRplus.The Mycobacterial Interspersed Repetitive Unit -Variable Number Tandem Repeat (MIRU-VNTR) database [10,11] comprises a collection of 186 well characterized strains representing the major M. tuberculosis complex in which, for each strain, species, lineage, and epidemiologic information are provided together with 24 MIRU loci, Spoligotype patterns, Regions of Difference (RD) profiles, Single Nucleotide Polymorphisms (SNPs), susceptibility data, and IS6110 Restriction Fragment Length Polymorphism (RFLP) fingerprint images.The system enables users to analyze genotyping data of their own strains alone or in comparison with the reference strains in the database; analyses and comparisons of genotypes can be based on Multiple Locus VNTR Analysis (MLVA), Spoligotypes, Large Sequence Polymorphism (LSP) and SNPs data, or on a weighted combination of these markers.Tools for data analysis include: search for similar strains, creation of phylogenetic and minimum spanning trees and mapping of geographic information.URL: <http://www.miruvntrplus.org>MTCID.The M. tuberculosis Clinical Isolate Genetic Polymorphism Database [12] consists in a repository of genetic polymorphisms, providing Single Nucleotide Polymorphism (SNPs) and Spoligotype profiles of clinical isolates of M. tuberculosis, based on published literature and manual curation.URL: <http://ccbb.jnu.ac.in/Tb/>SITVITWEB.The SITVITWEB [13] is a multi-marker database, comprising three major types of molecular markers: Spoligotypes, Mycobacterial Interspersed Repetitive Units (MIRUs) and Variable Number Tandem Repeat (VNTRs); this webserver is dedicated to the investigation of M. tuberculosis genetic diversity and molecular epidemiology.Currently, this international resource provides genotyping information on 62,582 M. tuberculosis complex clinical isolates from 153 countries of patient origin.URL: <http://www.pasteur-guadeloupe.fr:8081/ SITVIT_ONLINE/> Additionally, a few relevant computational tools are currently available as web services dedicated to analyze the genetic diversity of M. tuberculosis complex strains and characterize TB dynamics using molecular epidemiological data: The spolTools [14] comprise a collection of browser programs designed to manipulate and analyze Spoligotype data of the M. tuberculosis complex, consisting in an online repository of Spoligotype isolates collected from various published data sets (currently, 1179 Spoligotypes and 6278 isolates across 30 datasets), and online tools for manipulating and analyzing these data (computation of basic population genetic quantities; visualization of clusters of Spoligotype patterns based on an estimated evolutionary history; and a procedure to predict emerging strains/genotypes associated with elevated transmission).URL: <http:// www.emi.unsw.edu.au/spolTools/> The TB-Insight [15] is a collection of computational methods (based on different models and datasets) for both lineage classification of M. tuberculosis complex strains, and for visualization of genetic diversity in M. tuberculosis complex population and distribution by lineage, as well as visual representation of associations between patient and strain groups, providing perception on differences in phenotypic characteristics, and phylogeographic associations of M. tuberculosis complex strains with host populations.URL: <http://tbinsight.cs.rpi.edu/>

Gene expression and regulation
MTBRegList.The MTBRegList [16] is dedicated to the analysis of gene expression and regulation data in M. tuberculosis, containing predicted and characterized regulatory motifs cross-referenced with their respective transcription factor(s), experimentally identified transcription start sites, and DNA binding sites.URL: <http://www.usherbrooke.ca/vers/MtbRegList> MycoperonDB.The MycoperonDB [17] is a repository of known and computationally predicted operons and transcriptional units of (currently) five different mycobacteria -M.tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp.paratuberculosis K10, and M. leprae TN -whose genomes have been completely sequenced.Presently, it comprises 18,053 genes organized as 8,256 predicted operons and transcriptional units, providing literature links for experimentally characterized operons, and access to known promoters and related information.URL: <http://cdfd.org.in/mycoperondb/home.html> MTBreg.The MTBreg is part of the online services provided by the UCLA-DOE Institute for Genomics and Proteomics (http://www.doe-mbi.ucla.edu/),and consists in a repository of conditionally regulated proteins in M. tuberculosis grown under several different conditions mimicking infection; the database provides information on proteins that are regulated by selected transcription factors or other regulatory proteins, as well as on the experimental condition, the experimental dataset and a literature reference.URL: <http://www.doembi.ucla.edu/Services/MTBreg/>MycoRegDB.The Mycobacterial Promoter and Regulatory Elements Database [18] is part of a user-friendly web interface (RegAnalyst) that integrates a motif prediction program (MoPP), a pattern detection tool (MyPatternFinder), and a database of promoter and regulatory elements from various mycobacterial species (MycoRegDB).Currently, the MycoRegDB comprises the following species: M. tuberculosis (strains H37Rv and CDC1551), M. bovis BCG, M. leprae, M. smegmatis, M. avium subsp.paratuberculosis, M. marinum, M. ulcerans, M. gilvum, and M. vanbaalenii.For each database entry, a variety of useful information is provided, such as, gene annotation, CDS positions, promoter/regulatory sequence (with Transcription Start Point (TSP) or binding site explicitly marked), TSP-CDS/Motif-CDS distance, among others.The first release of MycoRegDB contained 290 annotated DNA motifs (174 promoters and 116 transcription factor binding sites) described in 81 research papers, according to the authors.. URL: <http://www.nii.ac.in/~deepak/RegAnalyst/MycoRegDB>

Structural biology
MtbSD.The M. tuberculosis Structural Database [19] is a resource dedicated to 3D protein structures of M. tuberculosis, providing relevant information on description, reaction catalyzed, domains, active sites, structural homologs and similarities between bound and cognate ligands.Currently, the database comprises 876 structures for 332 mycobacterial genes.URL: <http://bmi.icmr.org.in/mtbsd/MtbSD.php>3.7.Drug targets and resistance TDR Targets database.The Tropical Disease Research (TDR/WHO) Targets database [20] comprises extensive genetic, biochemical and pharmacological data related to tropical disease pathogens, including M. tuberculosis, as well as computationally predicted druggability for potential targets and compound desirability information; the goal is to exploit the availability of diverse datasets to facilitate the identification and prioritization of drugs and drug targets in neglected disease pathogens, such as the tubercle bacillus.URL: <http://tdrtargets.org/>TB Drug Resistance Mutation Database.The Tuberculosis Drug Resistance Mutation Database [21] is a comprehensive database that catalogs mutations associated with TB drug resistance and the frequency of the most common mutations associated with resistance to specific drugs, providing a resource for the development of molecular diagnostics for TB, as well as structural mapping of mutations to investigate mechanisms of resistance for drug discovery purposes.URL: <http://www.tbdreamdb.com/>

Conclusion
As outlined in this chapter, Informatics has acquired a great importance not only in the biological sciences, but in all areas of knowledge.Internet has become one of the most important tools for most people, from a dedicated researcher interested in the latest advances in his/her particular field of work to the teenager trying to contact his friends.Companies, industries and research institutes developed sites, where they expose their work to laymen.
The large number of publicly available databases and computational tools that have been developed, dedicated to organize, integrate, and provide efficient access to the ever-increasing amount of biological information produced over decades of research, have benefited researchers all over the world, especially those from low-income countries.
One important drawback, that still has to be overcome, is that the wealth of biological information available is presently fragmented, dispersed across numerous computational resources, and is redundant in many circumstances, clearly requiring unification in order to provide a global and integral picture of the biological systems they are dedicated to.
Ideally, the upcoming databases and computational tools should offer: data integration, providing multi-perspective analyses; combine in silico generated and manually curated data, improving the quality of our research; present efficient data structure, storage and processing, providing dynamic, flexible and fast data visualization, data searching, data retrieval and data analysis, via user-friendly graphical interfaces; implement a consistent and controlled vocabulary to describe the data and standardized data formats, providing full data interchanging and integration with other data sources.We believe that only in this way, a fruitful field for interactions and cooperation among researches from distinct areas might emerge, providing the required support to interpret and analyze this wealth of data according to a truly multidisciplinary approach.