CarbBuilder: Software for building molecular models of complex oligo‐ and polysaccharide structures

CarbBuilder is a portable software tool for producing three‐dimensional molecular models of carbohydrates from the simple text specification of a primary structure. CarbBuilder can generate a wide variety of carbohydrate structures, ranging from monosaccharides to large, branched polysaccharides. Version 2.0 of the software, described in this article, supports monosaccharides of both mammalian and bacterial origin and a range of substituents for derivatization of individual sugar residues. This improved version has a sophisticated building algorithm to explore the range of possible conformations for a specified carbohydrate molecule. Illustrative examples of models of complex polysaccharides produced by CarbBuilder demonstrate the capabilities of the software. CarbBuilder is freely available under the Artistic License 2.0 from https://people.cs.uct.ac.za/~mkuttel/Downloads.html. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.


Introduction
Carbohydrates encompass an incredibly diverse range of molecular structures; from the ubiquitous disaccharides to highly complex, multiply branched polysaccharides. The field of glycobiology is expanding as carbohydrates are increasingly recognized as key molecules in important biological processes, such as cell-cell interaction and host-pathogen recognition. [1][2][3] Knowledge of the three-dimensional (3D) structures of carbohydrates can greatly assist in a mechanistic understanding of these processes. As the characterization of carbohydrate molecular conformation remains a central problem in glycobiology, 3D theoretical models are often useful for interpreting experimental results. [4][5][6][7] However, there remain few software tools for building models of complex, possibly branched, polysaccharides from the specification of the primary sequence. Currently available software includes two webbased tools-SWEET-II [8] and the GLYCAM carbohydrate builder [9] -and two standalone software packages-POLYS [10,11] and our CarbBuilder tool. [12] All these software packages follow the well-established "pragmatic" approach to generating a 3D model of a carbohydrate: monosaccharides are treated as stable, rigid building blocks and glycosidic linkage dihedral angles are used as the primary variable for building. An alternative approach to structure building is embodied in the fast structural prediction software (FSPS) which searches for energy minima in glycosidic conformation space with the assistance of NMR data. [13] However, FSPS is not publically available [14] and hence we could not determine the usefulness of this more complex approach software for general, fast building of carbohydrate structures.
For the pragmatic approach to building, basic criteria for a valid 3D model of a carbohydrate is that it reflects the two-dimensional (2D) connectivity accurately and contains no physically impossible self-intersections or steric clashes of the constituent residues. A more stringent criterion for validity would ensure that the structure does not contain conformations of the glycosidic linkages at odds with experimental or calculated conformations. However, given the scattered experimental and computational data on glycosidic linkages, this criterion is harder to enforce.
The pragmatic software tools have differing sets of monosaccharides available for building and employ different approaches to validation of the 3D models produced, as follows. The POLYS software currently supports 14 different residue building blocks in its MONOBANK database, a number which increases to 84 when variations on basic monosaccharide units resulting from a/ b anomers, D/L enantiomers, furanose/pyranose forms of the sugar rings and derivatization of the residues with substituents (such as b-D-GlcNAc) are included. In POLYS, the user may supply a list of glycosidic torsion angles for each glycosidic linkage, or use the default conformational information from a database. POLYS may produce invalid structures with steric clashes, although the building procedure will give warnings if these occur. In this case, it is up to the user to establish better dihedral angles, which can be a difficult task in a constrained glycosidic linkage without prior knowledge of the preferred conformations.
The GLYCAM carbohydrate builder supports building with one bacterial residue and 26 mammalian residues, a range which is extended by specification of a/b anomers, D/L enantiomers and furanose/pyranose forms of the sugar rings. Sugar derivatives are covered to a limited extent in GLYCAM: four N-Acetyl sugars (GalNAc, GlcNAc, ManNAc, Neu5Ac) and three uronic acids (GalA, GlcA, IdoA) are supported. The GLYCAM carbohydrate builder has its own internal defaults for specific dihedral angles and allows the user to specify alternatives. However, the building process (which can be done either with the graphical input tool or with specification of structure in the condensed GLYCAM notation [15] ) can be problematic if a user wishes to explore the effect of a range of dihedral rotations on a large structure. In additional, although GLYCAM has a template library of prebuilt 3D structures of a number of common glycans and a build-by-URI feature, it remains difficult to input a very large, branched structure or multiples of a repeating unit in GLYCAM.
The SWEET-II web tool supports building with about 50 basic residues, a range that is again extended to over 200 by specification of residue anomers, enantiomers, and substituents. SWEET-II contains a useful set of example templates and structures for input, including a text facility for input of very large, branched structures. SWEET-II has a similar building approach to GLYCAM, but valid models are ensured by optimization of the 3D structure with the MM3 force field.
We designed CarbBuilder to combine the best aspects of existing carbohydrate structure building software into a standalone, command-line tool that can be easily incorporated into a software workflow. The command line interface allows the user to create scripts to automatically generate large libraries of glycans. We aim for simple, modular software that is fast, robust, easy to use and portable across different architectures. Here we report a new 2.0 version of CarbBuilder, which incorporates an expanded set of monosaccharide building blocks, an extended range of substituents for generalized derivatization of sugar residues and further improvements to the building algorithm to facilitate rapid building of valid 3D models of a broader range of oligo-and polysaccharides structures than any other current software.

Features and Improvements
CarbBuilder is an command-line tool that takes as input a text string describing the primary structure of a carbohydrate in the simple CASPER format [12,16] and returns a 3D structure in the Protein Data Bank (PDB) format. [17] For simplicity, we use the CASPER notation in all our discussions of structure below.
[a] When the absolute configuration is part of the trivial name, it may be omitted from the input.

SOFTWARE NEWS AND UPDATES
WWW.C-CHEM.ORG CarbBuilder allows for specification of multiples of a basic repeating unit to enable building of very large, branched polysaccharides. CarbBuilder produces either a single optimal 3D model, or, now, a whole family of possible structures, for any number of repeating units for a wide variety of complex, branched polysaccharides.
CarbBuilder is written in C# and hence is portable across different computer architectures.

Additional monosaccharides and substituents
CarbBuilder version 2.0 supports the common mammalian monosaccharides [3] and a broad range of bacterial monosaccharides, including linear alditols. Table 1 lists the monosaccharide residues Table 2. Examples of the required input to CarbBuilder to generate a range of carbohydrates.
Molecule Command-line arguments to CarbBuilder, with primary structure in CASPER notation [branches in brackets] Chitin 100 RU -i "->4)bDGlcNAc(1->" -r 100 C. albicans serotype A cell wall mannan [26] . Branches off the main chain are highlighted with bold font; branches off side chains are indicated with italics. Figure 1. Examples of repeating polysaccharide structures created with CarbBuilder from the sequence specification in Table 2 and visualized with VMD [23] . a) 100 RU of the chitin polysaccharide. b) A highly branched glycogen-like structure containing 10503 atoms. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

SOFTWARE NEWS AND UPDATES
WWW.C-CHEM.ORG supported in CarbBuilder version 2.0. This list has extended considerably since the first version of the software and is continuously updated (the "-help" command-line option will display the current list of monosaccharide residues and substituents). Taking into account the a/b anomers, the D/L enantiomers and the furanose/pyranose forms of the aldose and ketose residues, Carb-Builder currently supports over 100 basic monosaccharide components. Further, CarbBuilder allows for derivatization of existing residues with a range of substituents, also listed in Table 1. In version 2.0, CarbBuilder implements recursive parsing to allow for multiple substituents on a single residue, such as "aDGlcNAc46PyA," "aDGlcA2OMe4Ac," or "aLRha2Ac3Ac," which considerably extends the range of different monosaccharide building blocks into the thousands.

Building polysaccharide structures
CarbBuilder has a new, recursive routine for generating polysaccharide conformations. This procedure performs a search for a valid structure, connecting sugar residues one-by-one. During the building process, steric collisions or selfintersection of a chain (atomic "collisions") are reported to the user. CarbBuilder responds to collisions by attempting to complete the current linkage with alternative dihedral values. If this is not successful, CarbBuilder will backtrack to the previous linkage completed and attempt alternative dihedral values for that bond. Only after performing an exhaustive search for Figure 2. Typical complex N-glycan structure found on mature glycoproteins. [25] a) 2D diagram of structure. b) 3D model produced by CarbBuilder and visualized with VMD [23] from sequence specification in Table 2. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] Figure 3. The complex cell wall mannan molecule of C. albicans serotype A [26] . a) 2D diagram of structure from the sequence specification in Table  2 For the success of the building procedure, CarbBuilder relies on good initial values and alternatives for the dihedral angles in glycosidic linkages. A (1->X) glycosidic linkage is defined by the torsion angles / 5 H 0 -C1 0 -O1 0 -CX and w 5 C1 0 -O1 0 -CX-HX. These definitions are analogous to / and w in IUPAC convention. Carb-Builder has a large list of pre-computed optimal values of /,w dihedral angle pairs for specific glycosidic linkages, determined from disaccharide potential of mean force (PMF) calculations in vacuum. [18] (These calculations are done using the new CHARMM force field for carbohydrates [19][20][21] and the metadynamics method in the NAMD [22] molecular dynamics program version 2.9.) For each dihedral angle in the database, values of /,w angle pairs for multiple minima are listed, in decreasing order of predicted probability. During the building routine, CarbBuilder connects two residues with each of the torsion values in order, until either a viable sub-structure is generated, or all options are exhausted and the algorithm backtracks to the previous linkage. If a steric clash occurs, CarbBuilder attempts to avoid the collision with 610 degree rotations in the w dihedral angle. If this is not successful, the next pair of dihedral angle values is tested.
The list of linkage dihedrals is regularly updated with additional linkages, but is not exhaustive (which would be a Sisyphean task, given the range of possible disaccharides). When specific values are required for building a structure, but are not present in the database, CarbBuilder attempts to find reasonable defaults for linkages. If these are not present, generic a-or b-linkage defaults are used. For example, if the structure "aDGlcNAc3Ac(1->4)aDGlcNAc3Ac" is specified and the values for this linkage are not found in the database, CarbBuilder will use values from the "core" linkage "aDGlc(1->4) aDGlc" in preference to the generic a-linkage defaults. Users may also specify values for one or more of the glycosidic linkages in a structure in a separate file, using thed [filename] command-line option.
Finally, CarbBuilder can also generate all possible structures for a polysaccharide, given the dihedral values in the database. This is done using the using the -all command-line option to CarbBuilder. This option initiates the same recursive building procedure, but, instead of halting at the first successful build, CarbBuilder continues to build all non-intersecting structures. The resulting structures are output as a single PDB file containing multiple frames. However, as the number of possibilities grows exponentially with the number of residues, this option is impractical to use with very large molecules.

Large, branched polysaccharides
The command-line arguments to CarbBuilder required to build several representative large polysaccharides structures are listed in Table 2. The first two examples illustrate the use of the "-r" argument to specify multiple repeating units in a model. Figure 1 shows visualizations using the Visual Molecular Dynamics (VMD) software [23] of the resultant structures produced by CarbBuilder.
The linear chitin molecule contains only one type of glycosidic linkage: bDGlcNAc(1->4) bDGlcNAc. As CarbBuilder currently does not have torsion values for this specific linkage, dihedral angle values from the "core" linkage bDGlc(1->4)bDGlc are used instead. There are four /,w dihedral angle pairs (separated by commas) listed for this linkage, in decreasing order of preference, corresponding to minima calculated for the maltose disaccharide: In the case of chitin, the first pair of /,w dihedral angle values, 47 2.5, produces a valid 100 RU elongated structure with no atomic collisions to be resolved, which is output by Carb-Builder (Fig. 1a). Table 3. Primary sequences in CASPER notation for the O-polysaccharide repeating unit in 19 serotypes of S. flexneri.

Serotype
Command-line arguments to CarbBuilder, with primary structure in CASPER notation [branches in brackets]   Table 3. (Molecular images are visualized with the Visual Molecular Dynamics [23] package). Residues with gluco-, manno-, and galacto-configuration are colored blue, green, and yellow, respectively (L-Rha 5 6-deoxy-L-Mannose). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] Glycogen contains one type of residue, aDGlc, and two types of linkage: (1->4) on the backbone and (1->6) linkages at the branches (Table 2). When building this molecule, Carb-Builder encounters steric clashes between hydroxymethyl groups on adjacent residues at the aDGlc(1->4)[aDGlc(1->6)]aDGlc branch points, [24] which are only resolved on using the third pair, 25 25, of the /,w dihedral angle values for the aDGlc(1->4)aDGlc linkage (listed above). The resultant large model of a highly branched glycogen-like molecule (visualized in Fig. 1b) thus has these alternate values of the aDGlc(1->4)aDGlc linkage at each branch point, but not elsewhere.
Our next two examples illustrate building of large, branching polysaccharides with no clear repeat unit in CarbBuilder. The N-glycan structures found on mature glycoproteins are composed of a variety of monsaccharides, including sialic acid ( Table 2). Figure 2a shows a diagram of a typical complex structure, [25] with Figure 2b visualizing the 3D structure produced by CarbBuilder. In this example, CarbBuilder contains very few pre-calculated values for the specific dihedral angles in the structure and has to resort to default values. The 3D model produced is thus valid in that it contains no steric clashes, but it may violate experimental measurements on specific linkages. Nevertheless, a structure is produced, which can be a starting point for more accurate modeling.
The cell wall mannan of Candida albicans serotype A is a prime example of a highly complex, highly branched polysaccharide. This structure comprises a backbone of a(1->6)-linked mannose residues and a(1->2)-linked mannose branches, with occasional a(1->3) linkages and capping b-mannose residues, as illustrated in Figure 3a. [26] As there is no repeating unit for this molecule, the entire structure was specified as input to CarbBuilder. Figure 3b shows the resultant 2296 atom 3D model generated by CarbBuilder. For this example, we ensured that precomputed optimal values of all the /,w dihedral angle pairs were present in CarbBuilder for all the specific glycosidic linkages in the molecule. No steric clashes occurred on building. Thus, this structure can be considered to be a better estimate of the actual molecular structure than in the case of the Nglycan molecule. Further, as far as we could determine, this is the first 3D model of this complex polysaccharide produced.

Bacterial polysaccharides
To demonstrate CarbBuilder's ability to successfully build bacterial polysaccharide models containing variety of residues with unusual substitutions, we focus on the Shigella flexneri bacterium. On the basis of O-antigens, S. flexneri is divided into at least 19 serotypes, [27] the majority of which are modifications of the same basic O-antigen by glucosylation and/or O-acetylation of the sugar residues (Table 3). These complex, branched structures with multiple residue types and substituents represent an effective demonstration of the efficacy of the CarbBuilder building routine for bacterial polysaccharides.
We used the CarbBuilder software to build six repeating units (6 RU) of each of the 19 S. flexneri O-polysaccharides. From the primary sequence inputs for each of the serotypes in CASPER format (Table 3), CarbBuilder produced the 3D models depicted in Figure 4. As some of the S. flexneri polysaccharides are extremely sterically crowded (e.g., serotype 7), [28] we found that pre-computed optimal values of all the /,w dihedral angle pairs was vitally important for successful building of these branched polysaccharides; use of default torsion values resulted in unresolvable atomic clashes.
It is clear from the structures shown in Figure 4 that the conformations of O-antigens within a serotype tend to have strong similarities, whereas serotypes are usually clearly distinct from each other. Compare, for example, serotype Y with serotype 2. However, in some cases, CarbBuilder predicts significant differences for the conformations of structures within a serotype. Serotype 1 and serotype 7 are predicted to have the most within-group conformational variability. However, due to the "pragmatic" approach we have taken with CarbBuilder, the user should be wary of drawing too many conclusions from a single model conformation. Carbohydrates are known to be extremely flexible and more sophisticated modeling methods that take into account conformational ensembles in solution are required for accurate predictions.

Summary and Future Developments
The CarbBuilder software enables automated building of polysaccharides of arbitrary length from the text-representation of the primary structure. Version 2.0 includes additional residues and substituents and a robust building routine. The applications detailed here demonstrate CarbBuilder's ability to generate valid 3D models of complex, branched polysaccharides and bacterial polysaccharides with non-mammalian residues.
Models of carbohydrate molecules produced by CarbBuilder may be used to interpret experimental observations, or as a starting point for more complex molecular simulations.
Future developments will focus on adding an even wider range of monosaccharides and substituents, in order to cover as broad a range of as possible of polysaccharide structures, as well as automated building of glycoproteins and glycolipids.