Accessing the Biocatalytic Potential for C−H‐Activation by Targeted Genome Mining and Screening

Cytochrome P450 monooxygenases (P450s) are ubiquitous hemeproteins that insert oxygen specifically into substrates leading to diverse chemical transformations. Utilizing their capabilities, microbial whole‐cell biocatalysts are applied in pharmaceutical and fine chemical industry to produce biomolecules and drug metabolites. In order to synthesize novel bioactive compounds there is a great demand to identify P450s with new reaction and substrate scope. In this study, genome mining and an activity screening were successfully combined to discover so far underutilized biocatalysts. The screening revealed the expected broad range of reactions, such as hydroxylations, dealkylations, reductions and desaturations. For Actinosynnema mirum and ritonavir the biotransformation was transferred to a preparative scale resulting in a ritonavir conversion of 90 % after 48 h and 13 different metabolites analyzed by LC‐MS2 and NMR. These results clearly demonstrate the potential of the underlying approach to identify promising whole cell biocatalysts with good conversion and product scopes.


Introduction
Since their discovery five decades ago P450s have emerged as one of the most studied families of enzymes due to their ability to catalyze the selective oxyfunctionalization of various substrates. Starting from an initial formal oxene transfer, P450s are able to catalyze many different chemical transformations, such as dealkylations, epoxidations or deaminations. [1] Linked to their natural role in detoxification reactions and oxyfunctionalization of active pharmaceutical ingredients (API) in the phase I metabolism, they serve as useful biocatalysts for an effective enzymatic one-step conversion to high value hydroxylation products. The synthesis of drug metabolites during an early stage of drug development is useful to determine the metabolic fate of a drug. The products can be used for structure elucidation and pharmacological activity or toxicity studies which is a central aspect in the approval of APIs. [2] Furthermore, the modification and diversification of already existing drug candidates by P450s can lead to even more active compounds. [3] While some chemical oxidation reactions for the synthesis of metabolites are known, the biologically relevant metabolites are mostly obtained by a biotransformation [4] either by the human P450s or their microbial equivalents, which typically exhibit higher specific activities and stabilities often resulting in larger product quantities. [5] In order to optimize the catalytic activity or broaden the substrate scope of P450 reactions, either new P450s can be identified or already known enzymes can be optimized by protein engineering. The most prominent enzyme used for engineering approaches is the CYP102 A1 (also referred to as P450 BM3) from Bacillus megaterium, since the wildtype enzyme exhibits a high expression level in E. coli and a high catalytic activity. [6] Mutants of CYP102 A1 were shown to metabolize a variety of drugs, such as naproxen, ibuprofen, [7] amodiaquine [8] or astemizole [9] producing the main human metabolites. Besides protein engineering to receive improved, tailor made biocatalysts, a likewise promising approach is exploiting the natural diversity of P450s to find catalysts for selective oxidations. Nowadays, several online databases exist that host the information of thousands of new, but underexplored P450 enzymes. Prominent databases are the P450 database maintained by David Nelson, which provides information on over 20,000 P450 genes [10] and CYPED with nearly 16,000 sequences. [11] Due to an increasing availability of microbial whole genome sequences and with that an increasing number of annotated P450s, it is not the matter of gene sequence availability any more but to correctly assign the gene-function relationship. [12] Some species, such as the actinomycetes are known to have a high number of P450 enzymes which corresponds to their natural rich secondary metabolism. [13] Different actinomycetes have already been described to successfully produce drug metabolites. In a whole cell biotransformation the conversion of cyclosporine was screened with 1,237 actinomycetes. The screening resulted in nine strains showing good bioconversion rates of cyclosporine into the two human hydroxylated (AM1 and AM9) and N-demethylated (AM4) derivatives. [14] In a recent study a novel P450 from the actinomycete Streptomyces platensis was identified showing high activity for the conversion of several pharmaceutical compounds while producing equivalent metabolites to the human metabolizing P450s 3A4, 2C8, 2C19, and 2D6. [15] Compared to bacteria, fungi, especially filamentous fungi, are known for even larger amounts of P450 which is related to their ability to inhabit ecological niches and their production of specialized metabolites. [16] Similar to actinomycetes, fungi have also been used for the production of drug metabolites. A potential new antimalarial compound cladosporin was incubated with 96 microbial strains, 48 fungi and 48 actinomycetes, to receive closely related but more active variants of cladosporin giving an excellent example of a late-stage functionalization with P450 catalysis. [17] The biotransformation generated a wide set of cladosporin analogues that could be used for structureactivity studies. All of these examples show that there is a huge potential within microorganisms to find efficient biocatalysts with new P450 reactivities.
In this study, we present a combined approach of a bioinformatic sequence analysis and medium-throughput activity screening of promising candidates in order to identify new P450s catalyzing novel reactions. Genomes were analyzed and P450 coding sequences were identified. Since the gene sequences alone do not reveal information about the functionality and activity of the P450s, 84 strains were tested in a medium-throughput screening towards the conversion of seven pharmaceutical compounds, namely amodiaquine, cyclosporine, tamoxifen, haloperidol, ritonavir, rapamycin and testosterone. Using this combined approach, we were able to generate a platform of novel whole-cell biocatalysts for hydroxylation reactions with some strains showing a broad reaction spectrum and wide substrate acceptance. For the actinomycete Actinosynnema mirum, and the substrate ritonavir, whole cell biotransformation conditions were optimized and transferred to a fermentative scale. A ritonavir conversion of approximately 90 % was achieved and in total 13 ritonavir metabolites could be found after 48 h of biotransformation. Further characterization by tandem LC-MS (LC-MS 2 ) and NMR analyses revealed among others the three main human metabolites.

A Genome Mining Approach Identified Strains with a high Hydroxylation Potential
As all P450s share a common structural fold of the heme center and the surrounding binding pocket, a genome analysis for the presence of conserved patterns and key-motifs is a promising approach to identify novel P450 enzymes. [18] To determine the number of P450 genes encoded per strain, matches per NCBI taxonomy-database entry were counted. Two different approaches were performed based on two different databases, namely protein data deposited on UniProtKB (PROSITE) and the CYPED database. The data-base analysis resulted in a list of organisms with their putative number sequences coding for P450s. Both approaches resulted in different total amounts of sequences, however mostly showing the same tendency of sequence distributions between the species. Sequence analysis for bacterial organisms resulted in a total number of up to 320 P450 sequences per organism for the CYPED and 215 sequences per organism for the PROSITE database. High values above 200 resulted from the phenomenon that entries were counted per NCBI Taxonomy ID, which can refer to a sequenced strain, a whole species or a genus and therefore can cause an overestimation of sequences. In particular, the genome mining showed that actinomycetes from the families Streptomycetaceae and Mycobacteriaceae possess the highest numbers of P450, corresponding to the expectations. Actinobacteria especially from these families are known for their large number of P450s with 18 found for example in Streptomyces coelicolor, 33 in Streptomyces avermitilis [19] and 70 in Mycobacterium rhodesiae NBB3. [20] In this study, we found up to 80 total P450 sequences in one strain of the Streptomycetaceae, while up to 145 sequences were found in a strain of the Mycobacteriaceae with the PROSITE analysis. The CYPED approach resulted in up to 82 total sequences for strains belonging to the Streptomycetaceae (except for 141 in Streptomyces griseus) and up to 120 sequences for strains belonging to the Mycobacteriaceae.
In addition to the total count of sequences, the numbers of sequences belonging to homologous families and superfamilies were determined using the CYPED database. Analysis of the homologous families and superfamilies revealed the highest number of superfamilies for Mycobacteriaceae with 30 superfamilies followed by Streptomycetaceae with 18 superfamilies. In literature, analyses of mycobacterial CYPomes, mostly of the pathogenic bacterium Mycobacterium tuberculosis, revealed a high diversity within the found sequences and a localization widely distributed across the genome, making determination and assignment of the catalytic function difficult. [13] Streptomycetaceae are one of the greatest producers of diverse natural products requiring many genes encoding for P450s with tremendous diversity in secondary metabolite gene clusters. [21] Furthermore, numerous transformations of xenobiotic compounds for industrial and environmental applications are known for Streptomyces species. [22] Mycobacteriaceae and Streptomycetaceae are therefore promising candidates to identify P450s with novel substrate spectrum and reactivity.
Genome mining of fungi revealed even larger numbers of P450s per organism. The mining of fungal genomes for P450s revealed up to 120 sequences per strain with the CYPED approach and up to 232 sequences per strain for the PROSITE genome mining approach. The genus Fusarium and Colletotrichum/Glomerella, both groups containing plant pathogen species, showed the highest number of total sequences. High numbers of superfamilies (up to 36) were also present in these families. Many fungi are capable to degrade environmental pollutants, break down organic materials and produce secondary metabolites to adapt to various environmental conditions. This results in P450s with an enormous diversity to be found in fungal genomes. [23] Especially multiple phytopathogenic and plant degrading fungi have multiple P450s. [24] The highest number with 250 P450 encoding genes is described for the brown-rot fungus Postia placenta. [25] The genome mining thus identified promising candidates for a subsequent activity screening on pharmaceutical substances. The number of candidates for this activity screening was further reduced based on different criteria. As first criterion, they had to have a high number of total P450 sequences. Besides the number of total sequences, a high number of superfamilies was supposed to guarantee high diversity in reactivity as second criterion. As criterion three, good cultivability and four, novelty were chosen; hence strains were picked which had not been described as oxidation catalysts before. 89 bacterial and fungal strains fulfilling these criteria were selected. If possible the exact strain was included; on some strains with limited availability a closely related strain was included. Chosen bacterial strains for the activity screening mainly belonged to the orders Streptosporangiales, Pseudonocardiales, Streptomycetales, Corynebacteriales with highest total sequences for the Streptomycetales and Corynebacteriales ( Figure 1A). For fungi, strains with high numbers of P450 sequences did not belong to some main orders but were more widely distributed across different classes. The chosen fungi mainly belonged to the Ascomycota and the classes Dothideomycetes, Eurotiomycetes and Sordariomycetes showing highest total sequence numbers within the Sordariomycetes, especially for the order Glomerellales ( Figure 1B).

Screening Revealed Broad Reactivity on Model Substrates
To determine functionality and reactivity, selected strains from the genome mining approach were tested for the conversion of seven representative substrates. From the 89 selected strains 84 strains were cultivable with the applied conditions. The media NL148 s (fungi) or NL148sb (bacteria) were used for the screening, if not indicated otherwise. After 48 h and 72 h of biotransformation the product profile was analyzed via LC-MS and new products with their respective mass-to-charge ratio (m/z) were counted. Natural products formed by the strain were distinguished from the product profile by a negative control (culture without substrate). The number of formed products from all tested substrates was plotted against number of sequences determined by the genome mining approach (Figure 2).
Among the tested bacteria (Figure 2 A), strains belonging to the Pseudonocardiales, Streptosporangiales, and Streptomycetales showed the highest product spectrum on the tested substrates with up to 52 formed products. Between 8 and 29 sequences were found on average for these groups using the CYPED approach. In contrast, strains from the Corynebacteriales (45363, 45147, 44199, 43066, 44338, 45407, 43239, 44107, 44383, 44892, 45091, 44980, 43826 and 45343) generally showed the lowest metabolite numbers with 15 formed products for the soil bacterium Gordonia rhizophera (44383) as highest value within this group. In general, no correlation between sequence number and number of formed products could be observed. On the one hand, especially the class of Myctobacteria revealed high numbers of total sequences and superfamilies of P450s, however, resulted in the lowest product formation with a maximum of nine formed total products. The strain Mycobacterium hassiacum (44199) for example was found to have 45 total sequences in the PROSITE and 53 total sequences and 24 superfamilies in the CYPED approach, however, in the screening only five total products were identified. On the other hand, for the strain Streptomyces platensis (40041) for which genome mining resulted in 8 total sequences for CYPED and 13 total sequences belonging to three found superfamilies for PROSITE, 31 products were found. Also regarding the substrate acceptance, Mycobacteria did not show a broad specificity, with conversion of only up to three of the seven substrates, regardless of the high number of total sequences and superfamilies. Additionally, mostly substrates with lower molecular mass, namely haloperidol and testosterone, were converted. Genomic analyses from literature showed that many mycobacterial P450s appear to be unique within mycobacteria catalyzing still unknown reactions. [26] The lacking conversion of the tested substrates could be explained by a narrow substrate acceptance, as mycobacterial P450s act rather specific.
In contrast to the Mycobacteria, Actinomadura rifamycini (43936), Kutzneria albida (43870) and Actinosynemma mirum (43827) belonging to the Streptosporangiales (43936) or Pseudonocardiales (43827, 43870) accepted all seven substrates. Three strains from these groups and three strains belonging to the Streptomycetales accepted six substrates. Even though these strains accepted all or nearly all substrates, especially ritonavir was broadly converted with 18 products for example for K. albida and 11 products for A. mirum (see supporting information). Among others, these were single hydroxylations . Despite belonging to the same phylum Actinomycetales, Mycobacterium and Streptomyces species were shown to have few similarities in the P450 profile. [27] Many widespread P450 families such as the families CYP105 and CYP107 appear to be unique in Streptomycetaceae and other related actinomycetes. Characterization of these families revealed roles in oxidative tailoring of secondary metabolites and the metabolism of xenobiotics and thereby showing broad substrate specificities and catalytic reactivity. [13] Therefore, broad substrate acceptance of these enzymes catalyzing diverse reactions besides native catalytic activity is most likely. Well characterized P450s from these families are involved in the synthesis of the antibiotics filipin (CYP105D1) [28] and pikromycin (CYP107B) [29] but were also shown to catalyze hydroxylation of vitamin D (CYP105 A1). [30] For fungi the number of total products compared to the bacterial strains was even lower, though genome mining resulted in higher numbers for P450 sequences (Figure 2 B). Comparable to the results for Mycobacteria, substrates with low  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57 molecular mass, such as testosterone, were converted by nearly all fungal strains, whereas cyclosporine, the largest molecule, was not converted at all. The strain Neurospora tetrasperma (MYA-4615) for example formed 16 products for the substrate testosterone, with several hydroxylation (m/z = 305.1) and hydroxylation/desaturation products (m/z = 303.1). Possibly, biotransformation of testosterone is likely, as testosterone has a similar structure compared to ergosterol, a major component of eukaryotic cell membranes from yeasts and filamentous fungi. The P450 enzymes participating in the ergosterol synthesis, CYP51 and CYP61, might also act on testosterone as substrate. CYP51 and CYP61 are highly conserved in the entire kingdom of fungi. [31] Despite a low number of formed products, fungi also showed a narrower substrate acceptance with five accepted substrates as maximum for Sordaria macrospora (997), Uncinocarpus reesii (1704), and Podospora pauciseta (980). From these three strains, S. macrospora (997) and U. reesii (1704) formed the highest number with up to 22 total products. Only for Neurospora tetrasperma (MYA-4615) a higher number of 23 products was detected. Additionally, S. macrospora showed high intensities of the formed products, especially for the potential desethylamodiaquine (m/z = 328.0), tamoxifen demethylation (m/z = 358.1) and tamoxifen single oxidation (m/z = 388.1).
The screening revealed novel biocatalysts showing a broad reactivity and different specificities for the tested substrates. Bacteria showed higher product formation of the tested substrates compared to the fungi, however on average more P450 sequences were found in fungi by the genomic analysis. However, it became also apparent that number and P450 diversity are not sufficient to predict the potential for oxyfunctionalization. Especially, some bacterial strains from the Streptosporangiales, Pseudonocardiales and Streptomycetales and some filamentous fungi from the Eurotiomycetes and Sordariomycetes were potent biocatalysts to catalyze diverse chemical transformations with a potential for further optimization of the biotransformation parameters.

Improvement of the Ritonavir Biotransformation Conditions for the Actinomycete Actinosynnema Mirum
One strain from the initial screening was chosen to optimize the biotransformation parameters in order to develop a scale up to a 3 L fermenter. A. mirum was chosen since it showed high reactivity in the activity screening and had a broad substrate spectrum accepting all seven substrates regardless of size and properties. A. mirum is a gram-positive actinobacterium belonging to the order of Pseudonocardiales which forms aerial and substrate mycelia. [32] The genome is fully assembled. A. mirum was shown to be capable of synthesizing natural products, such as nocardicin antibiotics [33] and ansamitocin P-3. [34] For biotransformation, the substrate ritonavir was chosen, as a high metabolite conversion of this substrate was detected in the initial screening. In contrast to the described inhibitory effect on the human major drug-metabolizing CYP3A4 and CYP3A5 isoforms, [35] this effect of ritonavir on the responsible P450 seems not to be present.
Optimization of substrate conversion and product scope was performed in the BioLector, as it offers the possibility of parallel cultivation combined with online monitoring of the biomass formation. Studies have shown that the substrate conversion rate is significantly influenced by the composition of media. [14] Therefore, in a first experiment different media for the pre-and main-cultivation were tested. Ritonavir was added in the main cultivation after 72 h. Highest total biomass formation was achieved with terrific broth (TB) medium ( Figure 3). Second highest biomass concentration was reached in the GYM (glucose, yeast, malt) medium followed by the NL148sb medium, M9 und CY medium. No growth could be detected in oatmeal medium, although, excellent growth on oatmeal agar is described in literature. [32] The growth rates (μ max ) were Figure 2. Conversion of the seven tested substrates. The total product formation of all substrates is plotted against the number of found sequences from both genome mining approaches. As both approaches did not cover the same strains, genomic data analysis results from both approaches (CYPED and PROSITE) were included. If both approaches resulted in a different number of sequences for the same strain, always the lowest number of found sequences was plotted on the x-axis. A: bacterial strains. B: fungal strains. Size of dots (big to small) and color indicates the number of accepted substrates. Red: seven, magenta: six, cyan: five, orange: four, green: three, blue: two, gray: one .  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57 0.28 h À 1 (TB medium), 0.27 h À 1 (GYM medium), 0.36 h À 1 (NL148sb medium), 0.27 h À 1 (CY medium) und 0.30 h À 1 (M9 medium). Calculation of growth rate for the oatmeal medium was not possible. In literature, A. mirum was shown to hydrolyze starch, casein, tyrosine and gelatin. [32] The media TB, GYM, NL148sb and CY contain casein-peptone, while starch was additionally present in NL148sb medium.
Contrary to the good growth of A. mirum in TB medium no ritonavir conversion after 48 h could be observed (Table 1). Besides cultivation in TB medium, cultivation in M9 medium resulted in no product formation. This however, might be linked to low biomass formation. Biotransformation in NL148sb resulted in the three metabolites M1, M2, and M5 (for chemical structures and nomenclature see Table 1 and Figure 7) and in CY medium cultivation resulted in one metabolite (M5). Interestingly, for cultivation in oatmeal medium the three metabolites M1, M2, and M5 were detected. These transformations might either result from the cells used for inoculation, or from slightly growing cells, which could not be detected due to the high turbidity of the medium. In microscopic analysis after 48 h cells could be observed in the oatmeal medium. Most metabolites (M1 to M7) were detected after the cultivation in GYM medium. GYM medium, also named International Streptomyces Project (ISP) medium 2, is a standard medium for morphological studies and characterization of Streptomyces species. It is the only tested medium containing malt extract. Furthermore, it contains yeast extract. Yeast extract was also present in NL148sb medium in which cultivation did also result in a good product formation. Possibly, components from either yeast extract or malt extract are responsible for P450 gene induction.
After cultivation in GYM medium, eight different products were identified. The m/z ratio of 582.3 for M1 represents the des-isopropylthiazolyl product of ritonavir and the m/z ratio of 580.3 for M2 represents the decarbamoylated product, which were described earlier in conversion of ritonavir with human liver microsomes. [36] M3 with the m/z ratio 707.3 indicates an Ndemethylation of ritonavir and M4 (m/z ratio of 723.3) an additional oxidation (+ 16) of M3. M5 with the m/z ratio of 737.3 (+ 16) indicates a single oxidation of ritonavir. Several single oxidations for ritonavir are described [37] with the isopropylthiazolyl oxidation product as the main human metabolite with around 30 % of the total initial dose. [36b] Metabolite M6 with an m/z ratio of 753.3 (+ 32) is a double hydroxylation product of ritonavir. M7 with an m/z ratio of 596.3 indicates a single oxidation with the removal of the carbamoyl group produced either from metabolite M2 or M5. [36b] M8 with an m/z ratio of 735.3 is possibly corresponding to an oxidation of metabolite M5 to a ketone. This metabolite was described before in a biotransformation of ritonavir with a P450 of the strain Streptomyces platensis. [15] Furthermore, products with the m/z ratio of 606.3 and 622.3 could be identified which were described as degradation products of ritonavir before and were also present in the sample directly taken after substrate addition. [38] These products were excluded from the product profile.
As cultivation in GYM medium resulted in the best product formation, GYM medium was used to characterize the biotransformation in more detail. The conversion of ritonavir over 176 h was determined using the BioLector system ( Figure 4). For eukaryotic P450 expression it is described that gene expression can be induced by the presence of the substrate. [39] To test an effect on conversion, ritonavir was added in the exponential phase after 20 h. Early addition of the substrate did not show a negative effect on growth. A biomass concentration of 3.3 g CDW L À 1 was reached after 40 h. The substrate ritonavir was fully converted after 48 h of cultivation. In further experi-  and could also observe a linear dependency of glucose uptake rate and biocatalytic activity when using resting cells. [40] This shows that the biocatalytic performance of the whole-cell biocatalysts relies on the metabolic activity. However, in this study a further addition of glucose did not show further improvement of the conversion (data not shown) indicating that cofactor regeneration was not limiting. Minura et al. furthermore showed that the addition of 5 aminolevulinic acid (5-ALA), a precursor of the porphyrin biosynthesis, led to the up-regulation of the P450 CYP2A6 activity and prodrug activation. [41] Addition of 5-ALA in the exponential phase, however, did not show further improvement of the conversion (data not shown). As the ritonavir addition in the exponential phase did not show a negative effect on growth and a good substrate conversion was reached, a batch cultivation with GYM medium and substrate addition after 20 h was utilized for the scale-up.

Ritonavir Biotransformation on Preparative Scale
In a next step the conditions from the BioLector cultivation were transferred to a 1.7 L preparative scale ( Figure 5). Fermentation of A. mirum showed good scalability as a similar growth behavior could be observed. The final biomass concentration of 4.3 g CDW L À 1 was 1 g CDW L À 1 higher compared to the results in the BioLector which might be a result from better oxygen availability of the stirred system. Glucose was depleted after 44 h of cultivation. After 24 h already 46 % of the initial 170 mg of ritonavir were converted, 90 % after 48 h. After 68 h the fermentation broth was centrifuged and the biomass, as well as the supernatant were separately extracted with ethyl acetate with an extraction efficiency of 70 % calculated from the transition of the internal standard darunavir. After the extraction of both, soluble phase and biomass, it could be observed that the highest amount of unconverted substrate remained in the biomass phase (10.4 mg). With a ritonavir solubility of 0.30 mg L À 1 in water, [42] precipitation of ritonavir and accumulation in the biomass is likely. The poor solubility makes high substrate loading challenging. After 48 h the metabolites M1 to M8 were formed ( Figure 6). For further LC-MS 2 and NMR analysis the soluble phase was fractionated into nine fractions (F1-F9) by preparative HPLC. For the single oxidation products, M5-1 and M5-2, LC-MS 2 analysis revealed two different oxidation products on the isopropylthiazolyl moiety (Figure 7). Metabolite M5-2 was the   main metabolite formed and structure elucidation with NMR revealed the isopropylthiazolyl oxidation product at the isopropyl moiety. After fractionation 32 mg of M5-2 with a purity of 97.7 % could be isolated corresponding to a yield of 19 %. As pure isolation of the other metabolites was challenging the percentage of the metabolite in the different fractions was determined ( Table 2)   however, a differentiation of the hydroxylation positions was not possible. Both metabolites show hydroxylation on the isopropylthiazolyl moiety. This indicates that the same hydroxylation positions as found in the single hydroxylation products (M5-1 and M5-2) are present.
In general the scale-up in a stirred system showed a positive effect on biomass formation and ritonavir conversion. A. mirum is therefore a promising candidate for further biotechnological applications. As a full assembly of the genome is available, A. mirum is also a promising strain to undergo molecular analysis of the P450 enzymes.

Conclusions
Cytochrome P450 monooxygenases possess the to date unsurpassed ability to selectively insert oxygen into many complex substrates at ambient conditions. Thus, this efficient one-step conversion resulting in regio-and stereospecific products is a useful approach for the production of oxyfunctionalized molecules with desired pharmacological properties. In order to take full advantage of their catalytic potential, it is still essential to identify and investigate new P450 monooxygenases. Many genome sequencing projects revealed genes encoding putative P450s. However, it is still challenging to determine the specific functionality of a single P450 by structure comparison. Analyzing the CYPome of wild-type strains using a bioinformatics approach to identify promising whole-cell biocatalysts is a favorable alternative.
The data presented within this study demonstrates that strain selection by genome analysis combined with a mediumthroughput screening enables the identification of new highly efficient biocatalysts for hydroxylation reactions. Using this approach we were able to generate a platform of novel wholecell biocatalysts useful for oxfunctionalization of diverse compounds. This platform contains strains with high reactivity and broad acceptance for substrates of various sizes and with different properties. A further refinement of the search algorithms taking more parameters than number of sequences into account might be useful to come up with an even more tailored set. Furthermore, we were able to demonstrate for the conversion of ritonavir with the actinomycete A. mirum that the transfer of the biotransformation from the initial screening to a preparative scale already yields good product formation. A conversion of 90 % and the formation of 13 different metabolites after 48 h was reached. Preparative HPLC could generate sufficient purity of the products for LC-MS 2 analysis or the structure elucidation by NMR. For the main metabolite M5-2 we were able to isolate 32 mg with a purity of 97.7 % after first fractionation. The good scalability of the biotransformation shows the potential of the strain collection for further applications in biotechnology and synthetic biology. For that, optimization of the biotransformation, especially higher substrate loading, combined with reaction engineering approaches and efficient downstream processing has to be carried out.

Experimental Section Genome Data Mining for the Identification of P450s
Genomes of sequenced organisms were mined using two different approaches based on different databases. In the first approach the conversed cysteine heme ligand signature specific for P450s (code "PDOC00081") that is deposited on the PROSITE was matched with protein sequences stored on the protein database UniProtKB. Per NCBI-Taxonomy ID the number of found P450 gene entries was counted resulting in a list of organisms ordered by their number of total P450 genes. Using the second approach based on the CYPED data base, gene sequences per NCBI-taxonomy ID were determined. Additionally, sequences could be categorized belonging to superand homologous families. A sequence similarity cut-off of 55 % classified the sequences in homologous families, whereas a cut-off of 40 % placed the sequences into the same superfamily.
Stock concentration of haloperidol (HAL) was 5 mg mL À 1 and 2 mg mL À 1 for tamoxifen (TAM). All substrates were used in a 1 : 100 ratio resulting in final concentrations of 0.02 to 0.1 mg mL À 1 , respectively.

Analysis of Screening Products
Samples were analyzed by LC-MS using a 1260 Infinity II LC system equipped with Diode Array Detector (1260 DAD HS) combined with an Agilent Technologies 6120 Single Quadrupole LC-MS (Agilent Technologies). As a stationary phase a Poroshell 120 ECÀ C18 column, 4.6 × 100 mm, 2.7 μm (Agilent Technologies) was used. The column temperature was set to 30°C (analysis of ritonavir, amodiaquine, tamoxifen, testosterone, and haloperidol) or 40°C (analysis of rapamycin and cyclosporine). The flow rate was 1 mL min À 1 . As running buffer for the analysis of ritonavir, tamoxifen, testosterone, and haloperidol 0.1 % formic acid (v/v, solvent A) and 100 % acetonitrile (solvent B) were used with the following gradient: 0-3 min: 5 to 40 % B, 3-9.
Samples were taken directly (0 h) and 48 h after substrate addition or at 0 h, 48 h, 72 h, 148 h, 196 h after substrate addition. After harvesting, 300 μL sodium carbonate solution (0.1 M, pH 9.6) was added and samples were extracted three times with an equal amount of ethyl acetate. Darunavir (100 μM) was added as an internal standard prior to extraction. Samples were evaporated to dryness and afterwards resolved in 800 μL methanol. Samples were analyzed by LC-MS (see above). Ritonavir conversion was calculated from the integrated substrate peak at a retention time of 9.4 min in the DAD signal at an absorption of 254 nm.

Biotransformation of Ritonavir in a Preparative Scale
For preparative scale fermentation a laboratory fermenter KLF 3.1 L and the BioScada software for process parameter control from Bioengineering AG was used. 1.7 L of GYM medium and 170 μL antifoam 204 (Sigma-Aldrich) were in situ sterilized for 20 min, 121°C. Medium was inoculated with 27.6 mL of a preculture (grown 3 days in 100 ml shake flask, 28°C, 180 rpm). 5 mL samples were taken in duplicates after substrate addition, after 4 h, 7 h, 20 h, 24 h, 44 h and 48 h. Samples were centrifuged at 4000 g for 5 minutes at 4°C. The cell fraction was used for cell dry weight determination. 800 μL of the supernatant was extracted as described above and analyzed via LC-MS. 1 mL of the supernatant was used for the analysis of sugar concentration via HPLC Hitachi LaChrom Elite®. The column Repro-Gel H 8 × 300 mm, 9 μm (Maisch). The flow rate was 1 mL min À 1 , isocratic with 5 mM H 2 SO 4 . The column temperature was set to 40°C. The injection volume of samples was 20 μL. For detection the RI detector L2490 (35°C, polarity positive) was used. Analysis was performed with the software EZChrome Elite Client/Server (Scientific Software Inc.). Calibration curves (4 g L À 1 , 2 g L À 1 , 1 g L À 1 , 0.5 g L À 1 , 0.25 g L À 1 , and 0.125 g L À 1 ) of maltose and glucose were recorded.

Characterization of Ritonavir Metabolites by LC-MS 2
LC-MS 2 was performed with a Nucleoshell RP18 column, 2.0 × 100 mm, 2.7 μm, using an Agilent 1260 Infinity LC system combined with a compact quadrupole time of flight (Q-TOF) mass spectrometer (Bruker Daltonics). The flow rate was set to 0.4 mL min À 1 . A gradient of 0.1 % formic acid (v/v, solvent A) and 100 % acetonitrile (solvent B) was used. Solvent gradient was 0-3 min: 5 to 40 % B, 3-9.5 min: 40 to 70 % B, 9.5-10.5 min: 70 to 95 % B, 10.5 to 13 min: 95 % B. The Q-TOF was interfaced with electron spray ionization (ESI). Following ESI parameters were set: drying gas temperature: 220°C, nebulizer pressure: 4.8 bar, drying gas flow: 12 L min À 1 , capillary voltage: 4,500 V, end plate offset: 500 V. Quadrupole was used for selection of precursor ions and subjected to collision induced dissociation (CID) with nitrogen (collision energy: 28 eV). Analysis of fragments was performed with Q-TOF in a range of m/z 100 to 800. CID fragments for the 13 metabolites and ritonavir were recorded by multiple reaction monitoring (MRM, Table 3).