BioPartsBuilder: a synthetic biology tool for combinatorial assembly of biological parts

Summary: Combinatorial assembly of DNA elements is an efficient method for building large-scale synthetic pathways from standardized, reusable components. These methods are particularly useful because they enable assembly of multiple DNA fragments in one reaction, at the cost of requiring that each fragment satisfies design constraints. We developed BioPartsBuilder as a biologist-friendly web tool to design biological parts that are compatible with DNA combinatorial assembly methods, such as Golden Gate and related methods. It retrieves biological sequences, enforces compliance with assembly design standards and provides a fabrication plan for each fragment. Availability and implementation: BioPartsBuilder is accessible at http://public.biopartsbuilder.org and an Amazon Web Services image is available from the AWS Market Place (AMI ID: ami-508acf38). Source code is released under the MIT license, and available for download at https://github.com/baderzone/biopartsbuilder. Contact: joel.bader@jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Retrieve all the tRNA of yeast chromosome III description:"shock protein" AND organism:"Saccharomyces cerevisiae" Retrieve all the shock protein related entries of yeast feature:gene AND ontology term:0008150 AND orf classification:verified AND chromosome:chrIII Retrieve verified genes of chromosome III that are involved in biological process

PART FABRICATION
BioPartsBuilder provides a convenient function for users to fabricate parts that are larger than can be conveniently synthesized by commercial providers. For example, Gen9 Genebits are limited to 1 Kb in length. If a desired CDS or other synthesis target is 2.5 Kb, 3 Genebits will be required. The crucial design step is to ensure that three DNA fragments can be assembled correctly to obtain the expected sequence.
To use this assembly approach, we designed a greedy-algorithm that breaks the input sequence into fragments, which can be assembled unambiguously. Users specify the maximum fragment length (L max ), the internal prefix and suffix (F p , F s ), and the size of overlap (L o ) or a set of admissible overlap sequences (S o ). For a sequence of total length L bp longer than a vendor maximum sequence length θ bp, BioPartsBuilder will split the sequence into fragments as follows: 1. Sequence segmentation. The algorithm calculates the optimal number of fragments for the part as n = L/L max , and it splits the sequence into n fragments. Encoding overlaps is the crucial step of the fabrication process. The overlap encoding procedure relies on perfect matching only; while it could represent an issue for long overlaps, it is generally reasonable when dealing with short overlaps, which is the case for fragment assembly. However, the algorithm is flexible enough to be extended with different strategies, such as imperfect matching or secondary structure predictions.

IMPLEMENTATION
The software is implemented using Ruby-on-Rails (http://rubyonrails.org) open-source framework (v3.2.11) and the MySQL database server. The web interface is designed with the Twitter Bootstrap framework (http: //getbootstrap.com) for compatibility with standard computers and mobile devices.
BioPartsBuilder uses the BioRuby library to retrieve sequences and annotations [2]. The software uses Elas-ticSearch (http://www.elasticsearch.org) to perform search operations on annotated genomes. The public BioPartsBuilder website (http://public.biopartsbuilder.org) has Saccharomyces cerevisiae and Escherichia coli in the system. Users can add genomes to the stand alone version by using rake tasks. The command 'bundle exec rake partsBuilder:gff:import' will retrieve all the annotations from a GFF3 file and import the data to the database. The command 'bundle exec rake partsBuilder: promoter terminator:create' will create annotations for promoters and terminators. The promoter and terminator are defined as the 500 bp upstream and 100 bp downstream of CDS or till gene boundaries, respectively. Users can change the definition of promoter and terminator by modifying the corresponding rake task. To add annotations to the ElasticSearch index, users can execute the command 'bundle exec rake partsBuilder:tire: import annotation'.
BioPartsBuilder uses GeneDesign modules [3] for codon optimization and restriction sites recoding. The restriction enzyme tables are inherited from GeneDesign and include 3689 commercially available restriction enzymes. The codon usage tables are also inherited from GeneDesign and currently include Bacillus subtilis, Caenorhabditis elegans, Drosophila melanogaster, Escherichia coli, Homo sapiens, and Saccharomyces cerevisiae. GeneDesign permits import of custom codon usage tables beyond these standard tables [3].
Since the design tasks can be computationally expensive, BioPartsBuilder uses a queue system to process jobs in batch and increase parallelism; the current implementation relies on Sidekiq (https://github.com/mperham/ sidekiq), a Redis-backed Ruby library.
BioPartsBuilder can be deployed on any Unix environment. Detailed instructions are available on the wiki https://github.com/baderzone/biopartsbuilder/wiki/Installation.