Unlocking nature’s treasure-chest: screening for oleaginous algae

Micro-algae synthesize high levels of lipids, carbohydrates and proteins photoautotrophically, thus attracting considerable interest for the biotechnological production of fuels, environmental remediation, functional foods and nutraceuticals. Currently, only a few micro-algae species are grown commercially at large-scale, primarily for “health-foods” and pigments. For a range of potential products (fuel to pharma), high lipid productivity strains are required to mitigate the economic costs of mass culture. Here we present a screen concentrating on marine micro-algal strains, which if suitable for scale-up would minimise competition with agriculture for water. Mass-Spectrophotometric analysis (MS) of nitrogen (N) and carbon (C) was subsequently validated by measurement of total fatty acids (TFA) by Gas-Chromatography (GC). This identified a rapid and accurate screening strategy based on elemental analysis. The screen identified Nannochloropsis oceanica CCAP 849/10 and a marine isolate of Chlorella vulgaris CCAP 211/21A as the best lipid producers. Analysis of C, N, protein, carbohydrate and Fatty Acid (FA) composition identified a suite of strains for further biotechnological applications e.g. Dunaliella polymorpha CCAP 19/14, significantly the most productive for carbohydrates, and Cyclotella cryptica CCAP 1070/2, with utility for EPA production and N-assimilation.


Results
Screening for growth in artificial seawater. The CCAP collection maintains approximately 3000 protistan and cyanobacterial strains of which about 600 are marine micro-algae 24 . A total of 175 strains were short-listed for screening, based on their stability in long-term culture (Supplementary Dataset S1 online). Approximately 50% were isolated in the UK/UK territorial waters and the rest were of diverse origins world-wide (Supplementary Dataset S1 online). The taxonomic affiliations of the short-listed strains and a graphical summary of the outcome of the primary screen are depicted in Fig. 1 and phylogenetic origins depicted in Fig. 2. In the primary screen, strains were tested for growth in defined media and 33% were rejected due to poor growth leaving 117 that entered the secondary screen (Supplementary Dataset S2 online). The majority of strains studied were salt-tolerant and 103 grew well in artificial seawater-based medium (f/2), with relatively low nitrate (1 mM) under standard conditions (see methods) (Supplementary Dataset S3 online). A subset of 11 strains (Haematococcus and Dictyosphaerium), were grown in freshwater medium with similar nitrate levels (JM) and a minority, 3 strains, were grown in high nitrate freshwater medium (3NBBM+V) ( strains (Rhodophyceae and Porphyridiaceae) and Dinophyceae were grown under lower light according to reported requirements (see methods) 27 . Analysis of biomass yield and composition was carried out in the secondary screen (Supplementary Dataset S4-7 online).
High biomass strains of diverse phylogenetic origin. Biomass quantified by dry-weight (DW) or combustive MS elemental analysis of C is tabulated (Supplementary Dataset S4 online). A close correlation was observed between biomass yields (gl −1 ) or productivities (gl −1 d −1 ) measured by the two methods (Pearson's coefficient 0.828 and 0.877, P < 0.001; Supplementary Fig. S1 online) although C content as a function of DW ranged widely from 7-66% (mean 40%, RSD 28%). Overall, C was chosen as a better measure of valuable biomass.
The distribution of biomass yields and productivities for the screen are also shown graphically by mass C and DW (Fig. 3a,b). Ranked data are also shown for those strains exceeding the 70 th percentile for both C yield and C productivity ( Table 1). The first 8-9 strains ranked by these two parameters were not significantly different (t-test P > 0.05) and there were four strains in common: M. subterranea CCAP 848/1; Nannochloropsis gaditana CCAP 849/5; Nannochloropsis oceanica CCAP 849/10 and Tetraselmis sp. CCAP 66/60. Rhodella violaceae CCAP 1388/6 had significantly highest DW yield and productivity (t-test, P = 0.002 and P = 0.014) (Fig. 3b), but had low C content biomass (18% DW); hence C yield ranked 2 nd and C productivity 21 st (Table 1).
A close correlation between C yield and C productivity data was evident (Pearson's coefficient 0.887, P < 0.001), although the relationship was significantly influenced by taxonomic grouping; either by class or genus (MANOVA, P < 0.001) (Fig. 3a). Comparing species and strains from two of the high yielding genera: Nannochloropsis group mean overall growth rates were found to be significantly higher than for Tetraselmis (0.060 c.f. 0.042 d −1 ; t-test P < 0.001), although mean C yields (respectively 0.191 and 0.229 gl −1 ) were not significantly different (t-test P = 0.138) (Supplementary Dataset S4 online). The small cell-size of the Eustigmatophytes investigated here (<3 μm diameter), compared with Tetraselmis sp. (>20 μm) could be a contributory factor in accounting for the higher productivity observed 10,28 . Nevertheless, among the diatoms screened, Cylindrotheca fusiformis CCAP 1017/2 was the most productive and it was characterized by very large pennate cells (length 100 μm: www.ccap.ac.uk) ( Table 1). Ranked within the top eight strains for productivity it was not significantly different from N. oceanica CCAP 849/10 (P = 0.14), but had lower yields than this strain (P = 0.006, Table 1). The diatoms Extubocellulus spinifer CCAP 1026/2, Chaetoceros simplex CCAP 1085/3 and Cyclotella cryptica CCAP 1070/2 exceeded the 70 th percentile for C productivity, but were below this percentile for C yield (Supplementary Dataset S4 online). In addition to Tetraselmis, among the green algae, several Dunaliella strains and Chlorella vulgaris CCAP 211/21A were productive (Table 1). Of the haptophyte species, Isochrysis sp. CCAP 927/12 and Pleurochrysis pseudoroscoffensis CCAP 961/3 were also productive, but the remainder clustered around biomass yield and productivity means (Table 1, Fig. 3a).
High TFA content strains identified by MS analysis. A three-way comparison of N and C content determined by MS, along with TFA content determined by GC-FID (Fig. 3c). TFA content was an indication of total useful lipids present (non-polar and membrane glycerolipids). This analysis defined 94% of high TFA content strains (>30% TFA per DW) according to their C-content ≥48%DW and N-content <3%DW. The 16 species/strains thus selected were from the genera: Nannochloropsis, Chlorella and Dunaliella, whereas the outlier was Haematococcus pluvialis CCAP 34/6. The relationship between TFA content and C/N parameters was confirmed as significant as follows. Grouping by TFA content as defined in Fig. 3c, MANOVA gave P < 0.001 for C/N parameters and one-way ANOVA for N-content and C-content, gave P = 0.003 and P < 0.001. Post-hoc analysis using Fisher's comparison indicated significant difference in relation to C-content (P < 0.05) where TFA was >40% DW c.f. <40% and >30% DW c.f. <20%; in relation to N-content, >30% DW c.f. 10-20% and 10-20% c.f. <10% was likewise significant.
Within the group of 16 high-TFA strains defined above by high C and low N, protein ranged from 5-15% DW (mean 8.8%) and carbohydrate ranged from 9-26% DW (mean 14%) (Supplementary Dataset S4 online). The mean levels of this group were significantly less (t test, P < 0.001) than those strains also having ≥48%DW C, but with higher N, >3%DW N (6 strains); here protein ranged from 14-16% DW (mean 14.7%) and carbohydrate ranged from 14-44% (mean 31%). Therefore high C content can indicate glycerolipids or other hydrocarbons (75-85% C by mass) 29 but high levels of protein (53% C, 16% N) 30 , or other organic amines might also be responsible. This latter scenario would be revealed by high N content; indicating less C-partitioning into lipids. Conversely, partitioning towards carbohydrates (44% C) 30 would tend to reduce overall C content. In practice, >90% of the strains with high TFA levels (>30% DW) could be identified solely by MS analysis for N and C content. This provides the means for formulating a rapid and robust strategy for future screening for high TFA content.
C. vulgaris CCAP 211/21A appeared to be exceptional in these terms among the marine Chlorella-like strains tested (i.e. Chlorella and Chloroidium sp.) (t-tests, P = 0.001-0.02) (Supplementary Dataset S4 online). Although N. oceanica CCAP 849/10 was the best Nannochloropsis strain tested, all 12 species/ strains examined were in the high TFA-producing subset ( Table 2). Comparison of 18S rRNA and ITS genomic DNA sequences defined 5 of the strains as N. oceanica species ( Fig. 2; Supplementary Fig. S3 online). These were distinguished by higher mean TFA contents (>40%DW compared with the rest of the species/strains in the genus, which ranged 33-37%DW: t-test on group means, P = 0.002). They also had on average 45% higher yields and productivity (group means t-test within genus, P = 0.035-0.046) ( Table 2). In terms of estimated evolutionary distance, N. oceanica strains were closer to N. oculata than N. salina or N. gaditiana ( Supplementary Fig. S3 online). Despite this relatedness, N. oculata strains had the lesser TFA productivities within this genus ( Table 2). Protein content was also 60% higher in the latter compared with N. oceanica (10.3% c.f. 6.4%; t-tests comparing the individual strains: P = 0.001-0.02), suggesting species-specific differences in C-partitioning (Supplementary Dataset S4 online).
Concerning the other promising species listed (Table 2), H. pluvialis CCAP 34/6, because of its complex life-cycle and relatively slow growth, has limited potential for commercial lipid production 31 . A single Dunaliella strain, D. primolecta CCAP 11/34, out of 11 tested was identified as having similarly high TFA content and productivity (P > 0.05), but yield was half that of H. pluvialis (P = 0.046). Five diatom species, including the four identified above for highest biomass productivity, had moderately high TFA contents (20-25%; differences NS.) ( Table 2). Here, C. fusiformis CCAP 1017/2, also the best diatom for biomass, ranked the highest for TFA content, yield and productivity (significant for TFA yield and productivity c.f. Cyclotella cryptica CCAP 1070/2: P = 0.019-0.027).
Comparison of stationary phase and log phase TFA content (Fig. 3d, Supplementary Dataset S5 online) indicated that a group mainly comprising the highest lipid accumulators (Nannochloropsis species, marine Chlorellas and high lipid Dunaliella) had on average 4 times more TFA at stationary  Table 1. Top 26 micro-algal biomass producers. All strains were grown on f/2 unless denoted (*) in which case 3NBBM+V was used. All data points are above the 70 th percentiles for the screen. ‡ Significantly different (P < 0.05) from rest of screen except where denoted ( †). Full data in Supplementary Dataset S4 online.
phase than in log phase. The most productive of these strains would seem best suited to a fed-batch cultivation mode. A second group, comprising the haptophytes, cryptophytes, diatoms and freshwater Trebouxiophyceans tended to accumulate at least half of the TFA during log phase. The most productive lipid accumulators in this category (for instance C. fusiformis CCAP 1017/2) might also be suitable for semi-continuous production methods. Despite this potential caveat, a core set of about 7-10 strains above the 70 th percentile for content, yield and productivity in terms of protein or N were identified (Table 3). Although these included the small subset of 3 strains in the screen that were cultivated in high nitrate freshwater medium (3NBBM+V): M. subterranea CCAP 848/1, E. vischeri CCAP 860/7 and D. elegans CCAP 258/8, the rest were grown in standard saline f/2 medium. In terms of N-productivity, the former strain was at least 2-fold higher than any other micro-alga in the screen (t-test, P = 0.015). Although this strain also ranked highest for protein productivity and yield, it was not significantly higher in this respect than Tisochrysis lutea CCAP 927/14 or Chlorella vulgaris CCAP 211/75, both of which were grown on standard f/2 (P > 0.05, Table 3). The amount of N that was assimilated into biomass from the medium was indicated by N culture yield (and to an extent by protein yield), and these data are compared graphically with biomass C productivity  Table 2. Top TFA-producing micro-algal strains. * Significantly different (P < 0.05) from rest of column except where denoted ( †). All data points are above the 70 th percentiles for the screen. All strains were grown on f/2 unless denoted ( ‡) where JM was used. Full data set in Supplementary Dataset S4 online.

Sequestration of supplied N in biomass.
( Fig. 4a,b). The amount of N supplied in the standard low nitrate saline f/2 media (a majority of those studied i.e. 103 strains) was 0.0124 gl −1 ; this was similar in the low nitrate freshwater JM medium (utilised to cultivate 11 of the strains studied) at 0.0156 gl −1 and for high nitrate freshwater 3NBBM+V medium (utilised for 3 strains) this was 0.1236 gl −1 . The mean N yield for strains in f/2 was 0.0076 gl −1 , but with the higher C productivity strains, N yields tended to approach the amount of N supplied (Fig. 4a). Here, R. violaceae CCAP 1388/6, C. cryptica CCAP 1070/2 and Tetraselmis sp. CCAP 66/60 retained more N than the Nannochloropsis species studied (Fig. 4a). In contrast the 3 strains in high nitrate (3NBBM+V) assimilated <30% of supplied N (Fig. 4a). In two of these, M. subterranea CCAP 848/1 and Desmodesmus elegans CCAP 258/8, this equated to significantly more N accumulated than the rest of the screen (t-test, P = 0.015 and P = 0.045; Fig. 4a; Table 3), but was only associated with high protein yields in the former strain (Fig. 4b). The C/N ratio at stationary phase harvest correlated with C productivity and yields for the strains grown in relatively low nitrate media ( Supplementary Fig. S7 online). Of the best producing strains, for those growing in f/2 the C/N ratio was in the region of 30  Carbohydrate synthesis and C partitioning. Carbohydrate levels assessed using Dubois ranged from 3-81% DW (mean 30% and RSD 58%) (Supplementary Dataset S4 online). A similarly wide spread of data about the mean was noted for FA, but was less evident for protein or N (above). A three-way comparison of TFA, carbohydrate and protein productivities is shown graphically (Fig. 4c,d) and these data are also ranked for the top producing strains in Supplementary Fig. S8 online. Hence a great degree of flux control variation in C-partitioning between TFA and carbohydrate was apparent with most of the high biomass (C) producers focussing either on carbohydrate or TFA. A few of the high C producers (e.g. Tetraselmis sp. CCAP 66/60, C. fusiformis CCAP 1017/2), grown on standard low nitrate f/2, achieved a balance between TFA, carbohydrate and protein production (Fig. 4c,d; Supplementary Fig. S8 online). Dunaliella, Tetraselmis, Rhodella and Haematococcus species were the most productive for carbohydrate and Dunaliella polymorpha CCAP 19/14 emerged as the most productive strain (c.f. rest of screened P = 0.016, except H. pluvialis CCAP 34/6: P = 0.0856) (Fig. 4c, Table 3). R. violaceae CCAP  Analysis of FA composition and micro-algal phylogeny. A cluster analysis of FA compositional data for the screen is shown (Fig. 5, data in Supplementary Dataset S6 and Fig. S9 online). Hierarchical cluster analysis of the FA data separated the green algae from the chromistan and red algae (Fig. 5). Here distinct patterns of C 16 desaturation have been attributed to the action of distinct plastidial desaturases substrate specificities in the red and green algal lineages 33 . Clustering of FA compositional data led to further grouping of most strains by phyla, class and in some cases according to genus. The outcome of this exercise was most successful with the Prymnesiophytes, where FA composition appeared to vary along taxonomic lines. Conversely, in diatoms there appeared to be substantial compositional variation at the species and even the infra-species level. Further analysis of FA composition and phylogenetics is presented in Supplementary Text S1 online.
Implications for biofuels. The best biofuel strains identified were N. oceanica CCAP 849/10 and C.
vulgaris CCAP 211/21A based on content and productivity ( Table 2). PUFA levels can negatively impact biodiesel storage in proportion to their unsaturation 19,20 . In the screen as a whole, PUFA levels were high compared with current biofuel feedstocks, with a mean of 34%, but ranged from 4-74%, (Supplementary Dataset S6 online) 19,20 . These were relatively high at 32% in C. vulgaris CCAP 211/21A, but confined to tri-unsaturates or less (Supplementary Dataset S6 online). All the Nannochloropsis strains had low PUFAS's (5-11% TFA) of which about half was EPA. The high levels of 16:0 and 16:1n-7 in N. oceanica CCAP 849/10 led to a mean chain length among the lowest in the screen at 16.4 (Supplementary Dataset S6 online). This would be expected to be an improvement over palm oil for instance, where 16:0 and 18:1n-9 are the dominant FA, and where cold flow issues exist 19,20 . Several diatoms and haptophytes, had C 14 saturate levels ranging from 20-40% TFA (highest Odontella mobiliensis CCAP 1054/4), but accompanied by high amounts of C 20-22 PUFA's. This would diminish cold flow problems (albeit with oxidation issues from the latter), but high C 14 was not observed among the most productive TFA strains (Supplementary Datasets S4, S6 online). Nevertheless, model strains for genetic engineering or breeding could be found: C. fusiformis CCAP 1017/2, the best diatom TFA/biomass producer, was 7% C 14 , but other species/strains in the genus had up to 30%, e.g. Cylindrotheca sp. CCAP 1017/7. In the haptophytes, C 14 was at 20-28% in Prymnesium parvum CCAP 946/4, Pavlova salina CCAP 940/3 and Isochrysis sp. CCAP 927/12, but with moderate total TFA contents and productivities (Supplementary Dataset S6 online).
High-value FA producing strains. FA composition was analysed further in productivity terms for the commercially significant FA such as EPA, DHA, SDA and GLA (Supplementary Dataset S7 online, Table 4). The two best strains for EPA production were M. subterranea CCAP 848/1 and C. cryptica CCAP 1070/2 ( Table 4). The M. subterranea strain studied (also held in the UTEX algal collection as UTEX 151) is a known EPA source strain and was used as benchmark although it is normally grown in N-rich freshwater media similar to that used here 34,35 . Other C. cryptica strains have been used in mariculture 36,37 . The former Eustigmatophycean strain ranked highest in EPA yield and productivity, but this was not significantly more than C. cryptica CCAP 1070/2 (t-test P = 0.09 for both parameters) ( Table 4). Although M. subterranea CCAP 848/1 (and fellow Eustigmatophycean E. vischeri CCAP 860/7) had higher EPA FA composition (22% cf. 16%; t-test to C. cryptica CCAP 1070/2: P < 0.001 and P = 0.007), total TFA content per biomass was only 8-12% cf. 20% DW (P < 0.05) and with less EPA per biomass (2-3%DW cf. 4%; P < 0.05, Table 4). Therefore, under the screen conditions used here, the C. cryptica CCAP 1070/2 strain appeared to be a more promising source of EPA and possibly a suitable mariculture strain due to the higher EPA biomass content and capability for productive growth on low-N saline f/2 medium.
With respect to DHA, the most productive strains were confined to the haptophytes, where T. lutea CCAP 927/14 ranked the highest (see Table 4 for significance). This was combined with high DHA FA content (16%) which was only exceeded by the much less productive dinoflagellate A. carterae CCAP 1102/3 (19%, P = 0.002) and Pedinella marina 941/1A (18% but NS, P = 0.68); otherwise it was significantly higher than rest of screen (P = 0.023). The T. lutea CCAP 927/14 strain is extensively used in aquaculture 38 because of its suitable nutritional profile. However, the generally low TFA content of the examined Isochrysidales order, of 11-16% DW, would not favour them for non-polar lipid extraction. Significantly higher DHA per biomass was observed in Prymnesium parvum CCAP 946/4 and CCAP 946/6 (Table 4), with relatively high TFA content at 16-26% DW. DHA content at 13-14% TFA was slightly less than T. lutea CCAP 927/14 (P = 0.003 and 0.023) although DHA productivity was not significantly less than that of T. lutea CCAP 927/14 (P = 0.60 and 0.12; comparing all parameters between P. parvum strains: NS, P > 0.05) ( Table 4). Given that members of this genus produce a suite of toxins against fish and protozoa 39 , commercial use may be limited, if resolvable, through genetic means. Fish oil based feeds and dietary supplements often have similar levels of both EPA and DHA. In this regard, three Pavlovophyceaen strains were productive for both FA, of these Diacronema lutheri CCAP 931/7 was the most productive for DHA and is extensively used in aquaculture 40 . However, another related species, Pavlova salina CCAP 940/3, showed more balanced EPA and DHA levels, combined with TFA content of 20% DW (Table 4).

Figure 5. Cluster analysis of FA compositional data.
A data cut-off of 0.1% was applied and data (mol%) clustered using a PAST algorithm employing Rho parameters (bootstrap value N = 1000). Micro-algal classes were defined by the colour-coding scheme in Fig. 1 (2° screen). Strains undergoing taxonomic review are indicated (*); see Supplementary Text S1 online. Data tabulated in Supplementary Dataset S6 online. Haptophytes were also high producers of SDA, a precursor of EPA and DHA 14 (Supplementary Table S1 online); the most productive being T. lutea CCAP 927/14 (significantly higher c.f. rest of screen, P = 0.040). Its SDA FA composition was highest in the screen (17%, P = 0.008), excepting A. carterae CCAP 1102/3 (32%, P < 0.001). SDA productivity was also high in Chroomonas placoidea CCAP 978/8 and Pleurochrysis dentata CCAP 944/2, with the latter having favourable TFA content at just under 20% DW.
It was also instructive to examine the complete complement of omega-3 long chain PUFA in the screen (Supplementary Table S2 and Dataset S6-7 online). Although the health benefits, and commercial premiums, of individual omega-3 long chain PUFA are known to differ, a high ω-3/ω-6 ratio is thought to be beneficial in dietary fat. The mean ratio (≥C 18 PUFA) for the screen was high at 8.4, compared with western intake (~0.1) but varied greatly from 0.1-74 (Supplementary Dataset S6 online). The lowest ratios were due to high levels of Linoleic acid (LA or 18:2n-6) or Arachidonic acid (ARA or 20:4n-6), or  Table 4. Strains producing high-value omega-3 long-chain PUFA. All strains were grown on f/2 unless denoted (*) where 3NBBM+V was used. † Significantly different (P < 0.05) from rest of the column except where denoted ( ‡). All data are above the 70 th percentile for the screen for the first 4 parameters. Full data set in Supplementary Dataset S7 online.
both (e.g. Porphyridium). Mean omega-3 long-chain PUFA content in TFA in the screen was also high at 23% (RSD 53%) and ranged from 2-68%. The highest was Amphidinium carterae CCAP 1102/3 (t-test P < 0.001), due to SDA, DHA and EPA (Supplementary Table S2 and Dataset S6 online). However, this strain was not productive under the conditions employed and produces toxins 42 . When taking growth into account, a group of 11 strains lay above the following 70 th percentiles: omega-3 long-chain PUFA composition (i.e. ≥28%), content in biomass, yield and productivity (Supplementary Table S2 online). Most of these strains had TFA contents below 20% DW however. In fact a weak inverse-relationship was present between omega-3 long-chain PUFA (and total PUFA) composition in relation to TFA content in biomass ( Supplementary Fig. S10 online). But interestingly, the high TFA content (55% DW) strain C. vulgaris CCAP 211/21A, had significantly the highest omega-3 long-chain PUFA productivity in the screen (t-test P = 0.013) (Supplementary Fig. S11 and Table S2 online). Although this strain lacked potential commercially high-premium FA, the FA composition appeared to be beneficial from a dietary perspective with relatively high ALA (13%), oleic acid (48%) and low LA (10%); ω-3/ω-6 ratio 1.  Table 5. Summary of the most productive strains emerging from the screen. Strains are arranged in descending order of biomass productivity (gC l −1 d −1 ) focussing on best strains for a given species/genus. Scoring system refers to productivity: >95 th percentile (+++); >90 th percentile (++) and >70 th percentile (+) except for N-assimilation where this indicates high assimilation of supplied N. All tested under full-salinity culture unless indicated (*) where freshwater. † Commercial origin (see Supplementary Dataset S1 online). Full data found in Supplementary Dataset S4-S7 online. ω-3's: omega-3 long-chain PUFAs.
(Supplementary Dataset S6 online). By comparison, most major plant seed oil extracts such as Canola/ rapeseed or sunflower tend to have low omega-3 long-chain PUFA content, with LA a major if not the predominant unsaturated FA 20 .

Discussion
The aim of this work was to screen a micro-algal collection for strains of biotechnological potential. The focus was primarily on marine strains and the key objective was to identify high lipid producers, with additional measurements to provide a complete compositional analysis. This screen was carried out at medium-scale with 0.4 L culture volumes to yield sufficient biomass for several assays. However, it was found that elemental analysis for C and N content alone was sufficient to identify strains with high TFA content (>30%DW). This procedure requires only 1 mg DW (from as little as 2 mL culture) encapsulated in foil with a run-time of 8 min per sample. Therefore, for future screening, this procedure would allow a faster processing time and a significant scale-down of culture volume leading to higher throughput (more so than is possible with GC of directly trans-esterified FA 43 or FTIR spectroscopy 44 ). Only the Nile-Red plate assay approach would be faster, but this technique has limitations in accuracy relating to between-species comparison, dye uptake and carotenoid interference 25 .
The two highest lipid producers were Nannochloropsis oceanica CCAP 849/10 and a marine Chlorella vulgaris CCAP 211/21A strain. The former was originally isolated from a fish hatchery (Table 5, Supplementary Dataset S1 online) and, since many freshwater Chlorella are already commercially exploited, it is likely that both strains would be robust enough for use in open-air ponds 8,45 . This Chlorella is the first salt-tolerant strain with noted potential and the TFA content observed (52% DW) is similar to some of the higher reported levels in the literature for its freshwater relatives (48-57% DW; gravimetric measurements of total lipid) 46 . Given the high levels of ALA and oleic acid relative to LA, the lipid composition of this strain represents a dietary improvement over mainstay vegetable oils, which are usually high in LA 20 . On the basis of current commercial Chlorella production levels and the potential to increase these, opportunities exist for products in niche health-food markets, but in future a greater impact on dietary quality might be possible 13,15,20 . The FA composition of lipids from Nannochloropsis species, along with sunflower and Canola, are more suited for biodiesel production 19,20 . A detailed analysis of 12 different Nannochloropsis strains from 4 species found that N. oceanica strains had significantly higher TFA productivity and content than the others tested. This suggested that here phylogenetic origin was the major factor involved, rather than the local origin and/or the associated adaptations of the different strains to their local environments. Observing a significant relationship between phylogenetic and biochemical data at the species level indicated that the methods used in the screen were robust. It was noted that the aforementioned Chlorella and Nannochloropsis high lipid-producing strains accumulated most product in stationary phase. In contrast, haptophytes and diatom strains, where the most productive strain was C. fusiformis CCAP 1017/2 (Table 5), showed much less temporal variation in TFA accumulation. It was apparent that different phylogenetic groups should be grown using different cultivation methods, based on these data.
A subset of 20 strains is listed in Table 5 that was found to be the most productive for the specific storage products: lipid, carbohydrate and protein, and algal biomass. Interestingly, 3 of the top 8 biomass producers were isolated from commercial aquaculture sites, although the majority of strains entering the screen were originally collected from the natural environment (Table 5). It is likely that such strains will have undergone artificial selection predisposing them to mass culture 10 . In addition to flagging up previously unstudied strains, 4 out of 20 the highlighted strains from the screen were of previously known potential (  35,38,44,47 . This also demonstrated the robust nature of the methods employed in screening. The most promising source of total carbohydrate was D. polymorpha CCAP 19/14 (Table 5).
It was also notable that the top strains emerged from several different taxonomic phyla with varying latitudes of origin, from sub-tropical (e.g. T. lutea CCAP 927/14 from Hawaii) to cool temperate (R. violaceae CCAP 1388/6 from Sweden). Several had originally been isolated from brackish ecological niches (e.g. R. violaceae CCAP 1388/6; D. polymorpha CCAP 19/14; C. cryptica CCAP 1070/2 and C. vulgaris CCAP 211/21A), but thrived at seawater salinity levels (Supplementary Dataset S1 online). Overall, the common factors predisposing an individual taxon towards commercial exploitation seemed to be a high degree of adaptability and capacity for robust growth.
Micro-algae have received much interest as a source of high value omega-3 long-chain PUFA, for use in dietary supplements (i.e. valuable in commanding high commercial premium and health value), or for sequestering in the food chain in aquaculture or fisheries (i.e. of dietary health value added to the end product). Desaturated FA levels were altogether high in the screen compared with many terrestrial plant seed oils, and new strains for value-FA were noted 20 . For instance EPA productivity in C. cryptica CCAP 1070/2 matched that of a bench-mark M. subterranea CCAP 848/1 strain, used in aquaculture (Table 5) 35 . High DHA productivity was confined to the haptophytes, with 20 included in the screen. A routinely employed mariculture strain, T. lutea CCAP 927/14, emerged as the most productive, but a previously unstudied Pavlova salina CCAP 940/3 strain was also identified as a source of balanced EPA/DHA, with high TFA content. Overall, there was an inverse relationship noted between PUFA or total omega-3 long-chain PUFA levels versus TFA content, an observation which has previously been attributed to flux competition in FA biosynthesis 1,48 . Strains productive in omega-3 long-chain PUFA were often productive for protein, perhaps related to a reduced carbon partitioning into lipids (Table 5).
In order to maximize production rates of a desired product, a balance must be struck between partitioning of resources between its accumulation and cell growth 49 . Although noted in a smaller screen, there was no inverse correlation seen between TFA content and biomass productivity 50 . However, an inverse relationship was observed between biomass production and protein or N-content. In effect, the majority of the high biomass producers allocated most of their C into either carbohydrate or TFA by stationary phase, as opposed to protein or other organic N compounds, leading to high C/N ratios. Strains grown on relatively low-N media (saline f/2 or freshwater JM) might have undergone N-limitation and the most productive ones appeared to assimilate most of the supplied N into biomass, although there was some variation in this respect: R. violaceae CCAP 1388/6 and C. cryptica CCAP 1070/2 assimilated the most. Given that the cultures did not receive CO 2 supplementation it is also plausible that some became C-limited which could in turn place energetic restrictions on N-assimilation 10,51 . Proteonomic/transcriptomic studies suggest that in oleaginous micro-algae, catabolic processes linked to down-regulation of photosynthesis at stationary phase are likely to contribute to non-polar lipid accumulation and recycling of organic N [52][53][54] . Taken together, a greater understanding of these processes is likely to benefit lipid production or N-remediation by algae, and requires further study in the high producer strains identified here.
To summarize, a comprehensive screen was undertaken and this provided a rapid, "intelligent" strategy for future high-throughput screening based on a primary elemental analysis step for identifying high TFA and biomass-producing algae. A detailed analysis of composition cast light on the partitioning of resources in algae and provided a data resource for comparative genomics methodology. A repertoire of model strains for further investigation of biofuels and bioremediation has been provided and these may be tested by up-scaling for biotechnological purposes.

Methods
Growth of Micro-algae. All micro-algae tested were from the CCAP, UK; www.ccap.ac.uk. Cultures were grown in a defined, artificial seawater-based medium (f/2), with the exception of 11 freshwater taxa from the genera: Haematococcus, Dictyosphaerium and Cyanophora, which were grow in JM and three freshwater taxa from the genera Monodopsis, Eustigmatos and Desmodesmus grown in 3NBBM+V (www.ccap.ac.uk). The f/2-based medium was prepared as follows: 33.5 gl −1 Instant Ocean (Aquarium Systems, France) pH adjusted to 6.9-7.0; the following added to final conc. 75 mgl −1 NaNO 3 , 5. ); in the case of diatoms 30 mgl −1 Na 2 SiO 3 .9H 2 O; Tris-base to 1 mM, pH adjusted to 6.8-7.0; vitamins (final conc. Cyanocobalamin 0.5 μgl −1 , Biotin 0.5 μgl −1 ; Thiamine-HCl 0.1 mgl −1 ) were added after autoclaving. Growth was monitored by dual measurement of in vivo chlorophyll fluorescence and cell turbidity as described previously 43 . In the primary screen cultures of 100 mL were inoculated from starter cultures and incubated without agitation under a 12 h:12 h L/D (light/dark cycles) regime at 50-80 μmol m −2 sec −1 at 20 °C for 7-14 d, (Innova 44, New Brunswick Scientific, Edison, NJ). Cultures with no substantive growth after 14 d were discarded; cultures showing growth were used to inoculate secondary screen cultures once A 735 = 0.34, or when chlorophyll fluorescence reached 10,000 RFU (Relative Fluorescence Units). These values equated to 1 × 10 7 cells ml −1 of the standard model strain, Nannochloropsis oculata CCAP 849/1. Triplicate 400 mL cultures were inoculated at 5% (v/v) from starters into 500 mL aerated flasks as described 43 . Each flask was exposed to PAR (400-700 nm) 150 μmol photons m −2 s −1 for 16 h: 8 h L/D, at 20 °C throughout. A further 8 strains from the Rhodophyceae, the Porphyridiaceae and Dinophyceae were exposed to 50 μmol photons m −2 s −1 ; requiring lower light-levels 27 and 4 polar diatom species required temperatures of 4-10 °C. Samples were harvested by centrifugation at 1000-4000 g, for 15 min (Sigma 4K15 centrifuge). Dunaliella species required 1000 g to avoid risk of cell rupture, whereas the Eustigmatophytes required 4000 g due to small-cell size. Harvested cells were then flash-frozen in liquid N, freeze-dried and stored as described 43 . Log phase samples (100 mL) were harvested based on the above biomass concentration proxies; DW biomass yields were later checked to be within 20-60% of stationary phase biomass. Once the cultures reached stationary phase the remainder of the culture was harvested (defined by no change >±5% in either A 735 or chlorophyll fluorescence within a 2 d interval).

Measurement of biomass and its constituents.
Total C, N were determined by elemental analysis on 2 mg of freeze-dried material, and protein was determined by hot-TCA extraction followed by Lowry assay, as described 55 . TFA content was estimated by direct-derivatization of free and esterified FA by GC-FID as described (internal standards 10 μL 5 gl −1 tritricosanoin in chloroform or 100 μL 0.25 gl −1 15:0 tripentadecanoin in hexane, Larodan, Malmö, Sweden) 43 . Individual FA were identified using a combination of internal standards and GC-MS analysis of FAMES and DMOX-derivatives in representative strains as described 43 . Total carbohydrate content was estimated using the Dubois assay 56 . Lyophilized 5 mg samples were suspended in 0.5 mL 1 M H 2 SO 4 and extracted at 121 °C for 15 min. Samples were cooled and centrifuged for 10 min, 10,000 g. Assay was carried out on 10 μL supernatant by addition Scientific RepoRts | 5:09844 | DOi: 10.1038/srep09844 with gentle mixing of 0.5 mL of 4% phenol followed by 2.5 mL of conc. H 2 SO 4. Readings were at A 490 , calibrating the assay with glucose. Data analysis. Compositional data for N, C, carbohydrate, protein, TFA and value FA were expressed in terms of biomass content (%DW), culture yield (gl −1 ), batch culture productivity from inoculation to harvest (gl −1 d −1 ) and in the case of the specific FA, composition (%area). The best strains were ranked in excel for content, yield and productivity parameters, retaining strains above the 70 th percentile and comparisons of data by t-test. Graphical data output and Pearson's correlations were carried out using PAST 57 and statistical comparisons by 1-way ANOVA and MANOVA were carried out in MINITAB. The complete FA composition data-set was expressed as mol% using a cut-off of 0.1% prior to a hierarchical cluster analysis in PAST using rho-parameters. Phylogenetic analyses on 18S rDNA and ITS sequences were carried out using the Geneious 6.0.6 software package. Sequences were aligned using MUSCLE, editing out large insertions and drawing the trees using PhyML, bootstrapping for maximum likelihood inference where N = 1000.