Use of a Sec signal peptide library from Bacillus subtilis for the optimization of cutinase secretion in Corynebacterium glutamicum

Background Technical bulk enzymes represent a huge market, and the extracellular production of such enzymes is favorable due to lowered cost for product recovery. Protein secretion can be achieved via general secretion (Sec) pathway. Specific sequences, signal peptides (SPs), are necessary to direct the target protein into the translocation machinery. For example, >150 Sec-specific SPs have been identified for Bacillus subtilis alone. As the best SP for a target protein of choice cannot be predicted a priori, screening of homologous SPs has been shown to be a powerful tool for different expression organisms. While SP libraries between closely related species were successfully applied to optimize recombinant protein secretion, this was not investigated for distantly related species. Therefore, in this study a Sec SP library from low-GC firmicutes B. subtilis is investigated to optimize protein secretion in high-GC actinobacterium Corynebacterium glutamicum using cutinase from Fusarium solani pisi as model protein. Results A homologous SP library (~150 SP) for recombinant cutinase secretion in B. subtilis was successfully transferred to C. glutamicum as alternative secretion host. Cutinase secretion in C. glutamicum was quantified using an automated micro scale cultivation system for online growth monitoring, cell separation and cutinase activity determination. Secretion phenotyping results were correlated to those from a previous study, in which the same SP library was used to optimize secretion of the same cutinase but using B. subtilis as host. Strikingly, behavior of specific SP-cutinase combinations was changed dramatically between B. subtilis and C. glutamicum. Some SPs showed comparable cutinase secretion performances in both hosts, whereas other SPs caused diametrical extracellular cutinase activities. Conclusion The optimal production strain for a specific target protein of choice still cannot be designed in silico. Not only the best SP for a target protein has to be evaluated each time from scratch, the expression host also affects which SP is best. Thus, (heterologous) SP library screening using high-throughput methods is considered to be crucial to construct an optimal production strain for a target protein. Electronic supplementary material The online version of this article (doi:10.1186/s12934-016-0604-6) contains supplementary material, which is available to authorized users.

To find a tradeoff between experimental workload and characterization of a sufficient share of possible genetic phenotypes resulting from library transformation, oversampling is suitable. This means, the number of clones tested is x-fold higher that the number of library items, i.e. the number of signal peptides (SPs). The more different genetic libraries (SPs, ribosome binding sites, promotors, etc.) are screened, the more possible combinations, represented by different clones, need to be characterized. Therefore, a probability to hit at least once such a specific combination, i.e. clone, may help to decide which amount of oversampling is conducted. This process can be approximated by the idealized urn model with replacement. Following this, the probability to hit a specific clone at least once, ( ≥ 1), can be calculated according to (1), depending on the number of possible combinations (library size, denoted as ) and the x-fold oversampling (x-times the library size, denoted as ).
The number of balls in the idealized urn model is represented by the number of clones after transformation and the sampling of balls from the urn is represented by the clone picking procedure.
It is furthermore assumed that each SP has the same probability to be transformed. However, it is impossible to identify the total number of cells that were transformed during the electroporation procedure, and this number can be easily increased by increasing the amount of competent cells and plasmid DNA during the electroporation. Therefore, it is reasonable to assume the number of transformed cells to be much higher than the number of selected colonies.
For typical library sizes ( > 100), this probability depends approximately only on the oversampling, see (2), and is equal to ~0.95 and ~0.98 for 3-fold and 4-fold oversampling, respectively.

Probability to find an item of a screened library multiple times: Bernoulli process
Practically, some library items are found several times more than once during a screening process. In this study, during the screening process most SP have been identified once, but a few occurred twice, three times or even four times. One may raise the question about the expected occurrence of the multiple identification of library items (i.e., signal peptides in this study). Assuming that SPs had the same probability to be transferred, the screening process can be interpreted as a Bernoulli process, according to (3). Here, denotes the number of k-times hitting a SP, denotes the number of actually screened clones, and denotes the probability as introduced above with denoting the library size, i.e. the number of SPs. Consequently, the probabilities to identify a SP once, twice, three times or four times is calculated to be 0.293, 0.067, 0.010 or 0.001, respectively.

Simulating the SP library screening procedure and comparison with empirical results
An auxiliary simulation study was performed to assess the obtained occurrences of SPs regarding the assumption of complete and unbiased transfer of all 148 SPs from B. subtilis to C. glutamicum.
Therefore, the formation of 250 single colonies after transforming pXMJ9-SP Lib -cutinase was assumed, whereas each colony was assigned one of the SPs with the same probability. Afterwards, 66 out of these 250 colonies were sampled without replacement, and the number of SPs found multiple times (once, twice, three times, etc.) was counted. This procedure was conducted in 5000 repetitions and summarized results are shown in Figure S1, along with the experimentally determined multiple SP occurrences. Figure S1: Box plot showing results from simulation study for the multiple occurrences of SPs. Black dots represent experimentally determined occurrences of SPs that have been determined once, twice, three times or four times.