Closing Chernobyl

may be due to lead poisoning. According to a study by Herbert Needleman of the University of Pittsburgh School of Medicine and colleagues published in February in the Journal of the American Medical Association, boys with higher bone-lead levels are more likely to be aggessive and delinquent. "This is probably the most critical study that has been done on lead in the last five years," says Janet Phoenix, manager of public health programs at the National Lead Information Center of the National Safety Council, an international public interest organization. "The social implications are enormous." Lead has been linked with behavioral problems since the early 1940s, when pediatrician RK Byers noted that some children he had treated for acute lead poisoning subsequently developed violent, aggressive behavioral difficulties such as attacking teachers with knives or scissors. Needleman's study, supported in part by the NIEHS, is the first to link asymptomatic levels of lead with aggressive behavior and delinquency. Needleman and his colleagues studied 301 boys from primary schools in Pittsburgh. The researchers measured bone-lead levels by K X-ray fluorescence when the boys were about 12 years old. Based on the relative lead content of their tibias, the boys were divided into highand low-lead groups. Bone-lead levels reflect lifetime exposure to lead because, like calcium, lead is stored in bones. The boys in the high-lead group had normal levels of lead in their blood by the time of the study, showing that their lead exposure had occurred in the past. The researchers evaluated the boys' behavior at 7 and 11 years of age based on reports from three sources: the boys themselves, their parents, and their teachers. These data were from widely respected tests of antisocial behavior that had been administered by the Pittsburgh Youth Study, a longitudinal study of the developmental course of delinquency. At 11 years, the boys were given a self-reported delinquency interview, which comprises 35 questions such as how many times in the past six months a subject has "been drunk in a public place" or "attacked someone with a weapon." The parents and teachers filled out the child behavior checklist, which contains 113 symptoms of childhood behavioral disorders such as cruelty or bullying, shoplifting, setting fires, and apparent lack of guilt after misbehaving. When the high-lead boys were 7 years old, neither they nor their parents reported significant behavioral problems, and their teachers reported only borderline tendencies toward symptoms such as social problems, delinquency, and aggressive behavior. By the time these boys were 11, however, they reported significant increases in antisocial acts, and their parents and teachers reported significant increases in symptoms such as delinquent and aggressive behavior. The researchers corrected for confounding factors such as the mothers' intelligence, the presence of the father, and socioeconomic status. Many U.S. children have toxic bone-lead levels and-provided that their results are found to extend to the population at largeNeedleman and his colleagues conclude that lead makes a substantial contribution to delinquent behavior. Other researchers hail the Needleman study as the first to rigorously demonstrate a link between lead and antisocial behavior. The study was well-designed and its implications are likely to be valid, according to Terrie Moffitt of the Department of Psychology at the University of Wisconsin at Madison. Self-reporting is trustworthy when the period reported on is less than a year, the interviews are private and face-to-face, and confidentiality is guaranteed, she says, and the Needleman study met these conditions. Furthermore, Needleman's conclusions are strengthened by the fact that reports from three sourcesthe boys, their parents, and their teachersall linked lead with antisocial behavior. Lead is a neurotoxin and human studies indicate that its neurological effects are likely to be irreversible. However, delinquency is also associated with factors such as weak parent-child attachment, lax parental supervision, and school failure. Addressing these issues can mitigate the effects of lead. "These kids need help. They need support from teachers and parents," says Phoenix. "No one knew they were lead-poisoned." The good news is that environmental lead exposure can be avoided. "Lead-related delinquency is the easiest to prevent," says Needleman. "We should be able to wipe this disease out by removing old lead-based paint." Closing Chernobyl Almost 10 years after the explosion and full-scale meltdown of a graphite core at the Chernobyl nudear power plant in Ukraine, officials finally agreed to close the plant. The governments of Ukraine, the Group of Seven (G-7) industrialized nations, and the Commission of the European Communities signed a memorandum of understanding on 20 December 1995 that outlines a comprehensive program for the dosure of the Chernobyl nudear power plant by the year 2000. The program's provisions include a focus on nuclear safety, the development of a financially sound electric power market with market-based pricing to encourage energy efficiency and conservation; and a social impact plan to address the effects of the dosure of the plant on its employees and their families. Representatives of Ukraine, the G-7, and international financial institutions plan to meet annually to monitor the implementation of the program. The memorandum allocates $2.3 billion in aid, including $349 million for nuclear safety and decommissioning activities and $1.9 billion for new energy investments. The funding will come from grants by G-7 countries and loans from international financial institutions, although the financing has not yet been worked out. Financial details and the fact that the agreement is not legally binding have caused environmental groups to remain skeptical about the agreement. "If the West does not provide what Ukraine feels is sufficient capital, it's quite possible that Chernobyl might not be shut


INTRODUCTION
Biological pathways are often represented as pixel images (JPEG, GIF, etc.) or vector graphics (Scalable Vector Graphics (SVG) or PostScript). Typical examples of such static representations include those presented in databases such as KEGG (1), Reactome (2), BioCarta (http:// www.biocarta.com) and EcoCyc (3). Although a static representation is intuitive and informative and has been widely used in textbooks and illustrations, it is difficult to edit, or to reuse for analysis, modeling and simulation. As a result, important resources such as the KEGG database cannot be fully exploited. Notable steps toward meeting the challenge of computable representations include the development of BioPAX (Biological Pathways Exchange, http://www.biopax.org/) and KGML (KEGG Markup Language, http://www.genome.jp/kegg/xml/) with BioPAX focusing on detailed ontology while KGML includes layout information.
A number of software tools (4)(5)(6)(7)(8)(9) have been developed to visually build computable models of pathways. These tools are usually based on graphical models in which nodes represent genes, proteins or chemical compounds, and edges represent various types of interactions or associations. To date, few tools support the conditional dependencies of molecular and genetic entities and their associations. Thus, pathways encoded with existing tools may lack key information needed for interpreting the pathway's functioning.
In order to combine multiple pathways in a manner that is useful for modeling cellular behavior, two main challenges must be addressed. First, models must allow a hierarchical visual representation (6,(10)(11)(12). Second, data representation is complicated when several complexes share some of their proteins, because the role of a common protein generally depends on context (13,14). Methods such as semantic zooming or hierarchical decomposition (10,12,(15)(16)(17)(18)(19)(20) are needed to aggregate and abstract entire pathways or pathway portions into small units that can be displayed within larger pathway systems. Hierarchical structures are also very common in the computable representation of biological knowledge in BioPAX and KGML formats. A protein complex must often be represented as a node containing a set of nodes, one for each subunit. Each subunit in turn may itself contain a set of nodes representing conserved domains identified in the subunit's 3D structure or primary sequence. Representing a protein complex as a simple, non-hierarchical, node often obscures properties of the proteins because attributes of the simple node are aggregated across multiple proteins, each of which may have different attributes with respect to one another. An obvious workaround for this issue is to model protein complexes as 'compound nodes' (10,11,15) or 'metanodes', which are nodes with recursive internal structure ( Figure 1) (20).
While biological systems contain an appreciable amount of hierarchical organization, molecular components are reused across subsystems, making it impossible to perfectly capture all of the information into a nested set of relations. Strict hierarchical representations can capture biological substructure but cannot model overlap between protein complexes. Related to this idea, nodes that represent only a single protein may not have a unique state but may instead behave in a condition-dependent manner. It is common practice (21) to use multiple nodes to represent different states of the same protein to maintain clarity of control logic and conditional dependency in pathways. However, this can lead to an explosively growing chain of nodes. It also breaks data integrity and introduces data redundancy, as the same protein is represented by multiple nodes. More importantly, the exact conditional-dependent state of a given protein can be unclear or unknown in many pathways. A typical example can be found for protein STE20 in the MAKP signaling pathway for yeast (http://www.genome. ad.jp/dbget-bin/show_pathway?sce04010 þ YHL007C), which most likely has different activities under different conditions, but the exact nature of the state differences is currently unknown. How can such conditional dependencies be represented and modified when corresponding biological information becomes available?
Protein-protein interaction data sets obtained from either large-scale experiments or computational predictions, as well as coexpressed genes predicted from largescale expression data, can be used to help fill gaps in incomplete pathways (22)(23)(24)(25). Although many tools provide facilities to visualize expression data in the context of pathways (4,5,7,9,26), facilities to enrich pathways in a computationally based visualization system, using both interaction and expression profiles, are missing.
Here we report new developments in VisANT 3.0, a Web-based platform with new modules supporting exploratory pathway analysis using metagraphs (20) to address multi-scale visualization of multiple pathways; editing and annotating pathways using a KEGG compatible visual notation; visualization of expression data in the context of pathways; enriching pathways using either coexpressed components of known pathway members predicted from expression data in the SMD (27) and GEO (28) databases or proteins with known physical interactions and assigning genes/proteins of unknown function to known pathways. The new version of VisANT will help users take full advantage of the large number of available resources in the KEGG pathway database when building new pathways.

Metagraphs
A metagraph is a data structure for representing nodes, edges and subnetworks in a nested structure. One significant difference between a compound graph and a metagraph is that metagraphs allow one node to have multiple instances and these instances are automatically tracked. This capability allows a metanode in a metagraph to share nodes: each metanode has its own instance of the same node. Metanodes have two semantic states: an expanded state that reveals the associated subgraph inside, and a contracted state that hides the internal structure, rendering the metanode as a simple node. Edges between the nodes in an expanded metanode have the usual meaning (associations based on experimental data or computationally inferred correlations); edges between metanodes either reflect a correlation between standard (hidden) nodes or indicate that the same gene/protein occurs in both metanodes (20).

KGML and pathway integration tools
KGML is an exchange format for KEGG graph objects, particularly KEGG pathways, which are manually drawn and updated. The KGML files for KEGG metabolic pathways specify how enzymes (boxes) are linked by a relation and how compounds (circles) are linked by a reaction. In contrast, the KGML files for KEGG regulatory pathways contain only the former. KGML files for all supported species in VisANT have been preprocessed to map genes to their KEGG pathways, and a VisANT user can identify pathways for a specified gene either by searching for its interactions or resolving (normalizing) its names or IDs as explained subsequently.
Two pathway recommendation web services for identifying functionally related genes from transcriptional profiles are integrated in VisANT through its plugin architecture (20). Given a set of query genes, typically the known genes of a pathway, these services recommend additional genes in the same pathway as the query set. Both search engines support five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans and Saccharomyces cerevisiae. When VisANT is run as an online applet, connections to the services are mediated by the VisANT server.
GeneRecommender (29) discovers new genes with similar function to a given list of genes (the query) already known to have closely related function. It ranks genes according to how strongly they correlate with a set of query genes in those experiments for which the query genes are most strongly coregulated.
ClueGene (30) uses the pattern of how genes cluster together in sets of experiments to recommend new genes in a pathway. ClueGene bases its recommendations on the query set and on a cluster compendium. Each set of experiments is clustered independently. The collection of clusters constitutes the cluster compendium. Each gene in the genome is given a co-clustering score. Higher scoring genes are more highly recommended and tend to be found in small clusters in the cluster compendium along with query genes.
The use of VisANT (20,31,32) to mine, integrate and display biological interactions based on KEGG pathways and expression data is facilitated by a name-normalization service which resolves IDs used by different databases. In addition, customized ID mappings, as well as corresponding Web links, can be easily added to the network through a simple tab-delimited format. VisANT is developed using Java technology. In addition to the Web browser applet interface, VisANT can also be run as a stand-alone application which implements an auto-upgrading detection system to keep it up-to-date. Detailed information on VisANT's three-tier structure (31) and plugin framework (20) can be found at http://visant.bu.edu. In addition, a new error-reporting system has been implemented to enhance the integration reliability of distributed systems: users will have the option to report critical errors to the plugin authors and VisANT team.

Input
VisANT automatically recognizes the format of an input file based on its content. Only those formats that related with the new functions will be discussed here. The full list of supported files can be found in VisANT's web site.
Pathways can be loaded into VisANT using several different input methods as detailed in Figure 2. In particular, double-clicking on a contracted pathway node (e.g. the blue boxes in Figure 1) will also load the pathway if the corresponding KGML file is available from the KEGG. Expression data is input from a common tabdelimited file. The first column can be an Entrez Gene ID, an Access ID/GI number, a gene name or an ID from an organism-specific database. The file can have a header line to indicate the names of the different experiments; otherwise, VisANT will use a sequential number to identify different experiments. If the expression data is to be overlaid on an existing pathway, the name normalization service should be utilized first so that genes in the network and in the expression data can be matched to each other.

Output
All data shown in VisANT can be saved in an XML format using the VisANT Markup Language (VisML). Registered users can save the network on the VisANT server so that it can be accessed wherever the internet is available. VisML uses a version number to facilitate compatibility and extensibility. A description of VisML can be found at http://visant.bu.edu. In addition, pathways can be exported without visual information, as tab-delimited edge and node lists. Pathways can also be saved as pixel images, or as high-quality SVG for publication and illustration. An SVG file can be further polished with an SVG editor.

Pathway visualization, navigation and editing
Each pathway is represented as a metanode which may be nested within other metanodes (Figure 1). If links to other pathways are available in KGML, these pathways are represented as contracted metanodes. VisANT adopts the KEGG notation for graphics annotation so that users will have consistent views of the KEGG pathways. However, a few changes were necessary. In particular, a single protein/gene is represented as a filled green circle, and a metanode displayed as a green box is used to represent multiple proteins/genes. Additionally, the number of proteins/genes contained in a metanode can be revealed by double-clicking the box. Use of a metanode for a protein complex is also introduced (Figure 3). Multiple instances of the same node can exist even in the same pathway (ARG5,6 in Figure 1). These instances can be tracked by pressing the right mouse button over the corresponding node. Dashed lines will connect all instances of the node. The lines vanish once the mouse button is released.
Pathways can be easily edited in VisANT. Nodes and edges can be modified, added or deleted. Additional components can be added to pathways by a simple drag and drop. Pathways can be easily ungrouped or regrouped as one large pathway, depending on the user's needs. A pathway can also be located and loaded by using its (B) name or ID, (C) a URL or (D) KGML contents copied/pasted into the 'Add' box. (E) Pathway IDs will be shown in a node's tooltip if the protein/gene is involved in one or more pathways. In such a case, a pathway can be directly loaded using a set of drop down menus as shown above.

Multi-scale visualization, pathway overview and crosstalk
As with the extension of interactions for a given protein/ gene, pathways can be extended by double-clicking on a pathway node. Using this method, a network of pathways can be quickly constructed. Figure 1A shows the network of pathways by first loading pathway MAP00220 and then expanding the pathway MAP00910. It is worth noting that crosstalk between MAP00251 and MAP00910 mediated by the compound C00025 is only visible after MAP00910 is expanded.
Because the state of a metanode can be toggled by mouse-clicking, an overview of the pathway shown in Figure 1B can be easily achieved by contracting the two pathway nodes MAP00910 and MAP00220. Thus, VisANT is capable of easily exploring pathways at different scales: a pathway overview enables users to observe the topology of large sets of pathways, while the detailed internal structure of any particular pathway or set of pathways is easily revealed by mouse-clicking.

Overlaying expression data
VisANT provides two methods to visualize expression data over pathways: either the node color is used to represent the expression value in a particular experiment, or a plot of the expression profile is embedded in the node, as shown in Figure 3. The two methods can be toggled either for individual nodes or for the whole network. Different experiments can be navigated using a sliding bar and the navigation process can be animated. When the expression profile is shown, the corresponding experiment and expression value is indicated by a cursor.
In VisANT it is convenient to determine whether genes in the same pathway are coexpressed, as all the expression profiles of the nodes contained in a metanode (pathway), as well as the average profile, are drawn together as one plot with average profiles in black. Figure 3D shows such an example for a node representing a protein complex.

Pathway prediction
Sets of genes in the same pathway are often activated together and may have very similar expression profiles; their protein products may also interact, either physically or functionally, to achieve a specific task. VisANT provides functions to assign genes/proteins with unknown function to the known KEGG pathways based on these observations. Predictome (33) can easily be queried for sets of proteins that interact either functionally or physically with a specified protein. VisANT also has editing capabilities that allow any such set to be augmented with a user's own data set.
Genes with similar expression profiles can be identified using the ClueGene and GeneRecommender plugins and the genes so identified can be associated with one or another KEGG pathway in accordance with user specified criteria based on either functional or physical links ( Figure 2E) (25,34,35). Query genes can be placed in identified pathways by a simple drag and drop.
We suggest that users test the coexpression of query genes with known genes in the potential pathways and compare scores using either ClueGene or GeneRecommender. In addition, expression profiles can be compared if query genes are searched using GeneRecommender.

Pathway construction, enrichment and update
New pathways can be created from scratch or from relevant KEGG pathways, the latter of course being substantially more convenient because of KEGG documentation. In collaboration with the KEGG, the VisANT web site lists all pathways for which KGML is available, allowing easy access and loading into VisANT (Figure 2A). These reference KEGG pathways can also be updated when necessary. When loaded into VisANT, they can be enriched either by querying functionally associated components from experimental and computational results accessible from the VisANT-Predictome system, or by searching for coexpressed genes as indicated above.
We next describe a use-case scenario to illustrate some of the new features of VisANT. Suppose a user is interested in the g-secretase complex which acts in the H. sapiens notch signaling pathway ( Figure 3A), and wishes to get more knowledge about related genes or the internal structure of the g-secretase complex. First, the GeneRecommender plugin can be used to search for potential genes coexpressed with the five component members of the complex: APH1A, NCSTN, PSEN1, PSEN2 and PSENEN. GeneRecommender returns the top 10 coexpressed genes scored in the top 50 experiments. As can be seen from Figure 3B, the scores of the coexpressed genes can be separated into three groups. The top group, APH1A, PSEN1 and PSEN2, has much higher scores than the second group, PSENEN and LRRTM4. The plotter is linked to the network and selecting a spot in the plotter will select the corresponding node in the network ( Figure 3B and C). Note that query gene NCSTN is not included in the top 10 coexpressed genes, indicating that NCSTN is not positively correlated with other members of the complex. Anti-correlations are very common in signaling pathways ( Figure 3A); future implementations of the search engines will support identification of anticorrelated genes. Users may select different combination of query genes to achieve the best results. In addition, the degree of coexpression between members of a given metanode can be viewed by contracting the metanode and turning on the expression plotter option, as shown in Figure 3D. To further test the correlation of the 11 genes shown in Figure 3C, interactions between pairs of genes are queried against the Predictome database, which reveals the interaction between PSENEN and APH1A identified by coimmunoprecipitation (36), as shown in Figure 3E.
In addition, pathways can be updated against the KEGG database so that the latest pathway information can be easily incorporated into existing pathways customized by the users.

FUTURE DEVELOPMENT
Among our goals for further development of VisANT is supporting pathways from other databases, including Reactome (2), BioCarta (http://www.biocarta.com), EcoCyc (3) and INOH (http://www.inoh.org/). Since computable representations of pathways from these databases are available in BioPAX format, one way to proceed would be to increase VisANT's support of BioPAX. This will require developing an automatic layout algorithm since BioPAX, unlike KGML, does not contain layout information. More importantly, a standard visual notation for different types of nodes and edges will also need to be developed for different types of biological components, and for the relations between them. Second, unlike KGML in which each pathway is usually stored in its own file, pathways in BioPAX format are usually represented in one large file which can exceed 100 MB, making it impractical to load them all at once and also preventing exploratory navigation of pathways. New efforts, such as the latest developments in CPath (http:// cbio.mskcc.org/cpath/home.do) have made significant progress to overcome this problem by providing corresponding Application Programming Interfaces (APIs) that can retrieve pathways one by one in the format of BioPAX. We expect obstacles discussed above will be removed in the near future and pathways from these databases will be ready for use in VisANT.