Thalassiosira Community Composition and Diversity in Narragansett Bay

Diatoms contribute 40-45% of marine primary production. Understanding the contributions of individual species to diatom communities is important in light of warming ocean waters and changing food webs. Here, the diatom genus Thalassiosira, which exhibits irregular pulses in abundance in Narragansett Bay, is examined using data from the Narragansett Bay Long Term Phytoplankton Time Series. High-throughput sequencing (HTS) at the highly-variable V4 region of the 18S ribosomal gene identified fourteen Thalassiosira taxa in monthly samples over a sixyear time period (December 2008-December 2014), seven of which had not previously been identified in this time series. During the same time period, only four Thalassiosira taxa were identified through light microscopy (LM) counts. Many of the newly identified species for this location are small in diameter and difficult to identify using LM. However, they appear to be very frequent and may have a significant ecological role. 18S gene copy number was shown to greatly effect sequence abundance for this locus and this genus, and therefore only presence data was analyzed. Thalassiosira species grouped into four seasonal assemblages, although only 25% of this pattern was explained by temperature, suggesting that other factors such as competition or prey selection may drive Thalassiosira community structure. Comparisons to historical records show possible shifts in frequency and abundance of Thalassiosira species.

Thalassiosira taxa were identified through light microscopy (LM) counts. Many of the newly identified species for this location are small in diameter and difficult to identify using LM. However, they appear to be very frequent and may have a significant ecological role. 18S gene copy number was shown to greatly effect sequence abundance for this locus and this genus, and therefore only presence data was analyzed. Thalassiosira species grouped into four seasonal assemblages, although only 25% of this pattern was explained by temperature, suggesting that other factors such as competition or prey selection may drive Thalassiosira community structure. Comparisons to historical records show possible shifts in frequency and abundance of Thalassiosira species.
iii ACKNOWLEDGEMENTS This thesis would not have been completed without the support, guidance and friendship of many individuals. I must first thank my advisor, Tatiana Rynearson, for her patience and encouragement throughout this process and for her insight which greatly improved this work. She has enabled me to develop a thorough understanding of this subject and helped me to grow as a scientist and thinker. It has been an honor to work under her guidance alongside many great lab mates. I learned so much from all of you. I am particularly grateful to lab manager extraordinaire, Amanda Montalbano, for assisting with all things, large and small.    Table 4. Thalassiosira Species in Narragansett Bay………………………….………41 Table 5. Thalassiosira Cell Diameters and Ranges…………………………………..42   Phytoplankton account for over 45% of global primary production despite comprising only 1% of photosynthetic biomass (Field et al. 1998), underscoring the importance of these organisms. Within the phytoplankton, diatoms account for roughly 50% of oceanic primary production (Rousseaux and Gregg 2014). Diatoms are unicellular, photosynthetic eukaryotes characterized by a siliceous frustule. There are an estimated 30,000 to 200,000 extant diatom species (Mann andDroop 1996, Mann andVanormelingen 2013), indicating the great diversity within this class of phytoplankton. In coastal systems, diatoms play a crucial role in the silica cycle and in carbon flux (Treguer and De La Rocha 2013). Understanding the ecology of individual diatom species is essential for generating estimates of phytoplankton community composition, production, export and phenology. Diatom species traditionally are differentiated based on morphology (reviewed in Mann 1999), although a lineage-based species approach has gained support recently (reviewed in Alverson 2008), in which molecular and morphological characteristics are used to differentiate phylogenies. Here, we examine the diatom genus Thalassiosira, a common genus consisting of dozens of species that are often seen coexisting, using data from the Narragansett Bay Long Term Phytoplankton Time Series.
The Narragansett Bay Long-Term Time Series was initiated by D. Pratt and T.
Smayda (Pratt 1965, Smayda 1998 in 1959 and weekly sampling has been conducted by graduate students since 1999. Narragansett Bay is a highly productive 234 square kilometer estuary (Nixon et al. 2009) characterized by a phytoplankton community dominated by diatoms (Pratt 1965, Karentz and Smayda 1984, Oviatt et al. 2002. Understanding the ecology and community dynamics of this estuary are important due to the role of diatoms in affecting water quality, biogeochemical cycling, and production (Legendre 1990, Treguer andDe La Rocha 2013). Time series samples are collected from a station in the southern west passage of Narragansett Bay, which experiences seasonal cycling of physical, chemical and biological signals, including temperature, nutrients, and plankton abundances and diversity (Pilson 1985, Nixon et al. 2009). Since 1959, nutrient inputs to Narragansett Bay have changed and the average winter temperature has increased by 2. 2°C from 1960(Nixon et al. 2009, Smith et al. 2010. The scale, timing and even the occurrence of the spring bloom have also been in flux (Oviatt et al. 2002). Both bulk diatom counts and chlorophyll a concentrations have declined (Smayda 1998, Li andSmayda 1998), and the chlorophyll a decline is correlated with warmer water temperatures (Oviatt et al. 2002).
From 1959 through 1980, some Thalassiosira species were identified as important contributors to the Narragansett Bay phytoplankton community. In particular, T.
nordenskioeldii was identified as the 5 th most abundant phytoplankton species and was an important winter phytoplankton species, with a 5 year abundance fluctuation cycle (Karentz and Smayda 1984). Investigating potential changes in abundance or timing in Thalassiosira species is valuable given that diatom genera with larger cell sizes may contribute more significantly to carbon fixation than smaller cells such as Skeletonema, due to the positive relationship between carbon content and cell volume in diatoms (Strathmann 1967, Menden-Deuer andLessard 2000). Therefore, it is important to determine whether Thalassiosira species timing or frequencies have changed.
Thalassiosira is a centric diatom genus that can be chain forming and is very  (Pratt 1965, Smayda 1998, the Argentinean coast (Popovich and Gayoso 1999), a Scottish sea-loch (Harris et al. 1995), the North Sea (Hoppenrath et al. 2007), German and Dutch estuaries (Muylaert and Sabbe 1996) (Sunda and Huntsman 1995). Some of these physiological differences between the species have been observed in the field as well. Chappell et al. (2013) found that Thalassiosira community composition in the Northeast Pacific was correlated with temperature and iron concentrations.
Despite compelling evidence that this genus deserves further study, there are a few challenges with studying the Thalassiosira genus, primarily that identification of Thalassiosira species is known to be difficult, especially using light microscopy (LM) due to subtle differences in frustule morphology (Kaczmarska et al. 2009). Some Thalassiosira species are unidentifiable with LM methods (Hasle 1978). Complicating matters, the morphology of Thalassiosira species can vary, especially in the variable environmental conditions of estuaries, making LM identification all the more difficult (Hevia-Orube et al. 2015). Thalassiosira identification can be done using scanning electron microscopy (Hoppenrath et al. 2007), but this requires time and expertise.
Recently, molecular barcoding has been used to identify Thalassiosira species as well as other diatom species (e.g. Hamsher et al. 2011, Moniz and Kaczmarska 2010, Zimmerman et al. 2011, Luddington et al. 2012, Hamsher et al. 2013). This new approach has allowed for the discovery of previously unrecorded local species in North Sea estuaries (Hoppenrath et al. 2007 This study will use a high throughput sequencing community genetics approach to investigate Thalassiosira dynamics in Narragansett Bay and will address the following questions: 1) What Thalassiosira species are present in Narragansett Bay? 2) Are there annual and or seasonal patterns of Thalassiosira species composition in the Narragansett Bay? 3) Do fluctuating environmental conditions drive these putative patterns of species composition? To gain a more thorough understanding of Thalassiosira diversity and seasonal patterns in Narragansett Bay, we sequenced the 18S V4 locus for 80 field samples taken over the course of six years.
With this approach, Thalassiosira species not previously identified in Narragansett Bay were identified and correlated with environmental conditions.

Field Sampling Methods
As part of the Narragansett Bay Long-Term Phytoplankton Time Series, weekly water samples were collected from Station II in the southern west passage of Narragansett Bay, at 41.57°N and 71.39°W (Figure 1). Surface water samples were collected and filtered through 0.22 μm pore size, 25μm diameter Millipore ExpressPlus filters. The volume filtered was dependent on the Secchi depth measured for the sampled water; 100mL of water were filtered per each meter of Secchi depth.
Three replicate filters were collected each week and stored at -80°C for later DNA extraction and analysis.

Molecular Barcoding
Potential DNA barcodes were selected from current literature, including the variable V4 region of the 18S ribosomal DNA coding gene, the V9 region of the 18S, and the internal transcribed spacer 2 (ITS2) ribosomal gene (Table 1) (Chappell et al. 2013, Gast et al. 2004, Hamsher et al. 2011, Hamsher et al. 2013, Luddington et al. 2012, Moniz and Kaczmarska 2010, White et al. 1990, Zimmerman et al. 2011 (Larkin et al. 2007). The HKY+G model (Hasegawa et al. 1985) was selected to be the best fit nucleotide substitution model based on Bayesian Information Criterion (BIC) using jModelTest2 (Darriba et al. 2012, Guindon andGascuel 2003). This model allows for variable nucleotide base frequencies and has one transition rate and one transversion rate. The best ML tree was created with Paup* version 4.0b10 (Swofford 2003) and its reliability was tested using 100 bootstrap replicates. An ML tree was also constructed using the same methods for a 1552bp partial 18S gene with the same sequence accession numbers as the reference database. The best fit nucleotide substitution model chosen with BIC for this locus was the TIM1+I+G model, which has a variable nucleotide base frequency, variable transition rates and two transversion rates.

Mock Community Design
To test and validate the community high throughput sequencing approach used for this study, four mock communities were constructed to provide insight into the effects of 18S gene copy number variation, the amount of polymerase chain reaction (PCR) bias, and the amount of sequencing bias inherent to this method. The mock communities were comprised of varying amounts of four Thalassiosira species, three other diatom species, a dinoflagellate and a raphidophyte (Table 3). To obtain mock communities, monocultures of different species were maintained in exponential growth in F/2 media. Concentrations were assessed by counting 1 mL samples of each strain using a Sedgwick-Rafter counting chamber (Graticules S52) under 100-200x magnification, depending on species cell size. Cell concentrations were used to calculate the volume of each culture need to achieve the target species proportions.
The calculated volumes of each strain were filtered simultaneously on 0.2μm pore size, 25μm diameter Millipore ExpressPlus filters and stored at -20°C for up to 1 month, until DNA extraction. Reactions were amplified using a multi-step thermocycler protocol, which consisted of a two minute denaturing step at 94°C, followed by 20 cycles of 30 seconds at 94°C, 30

DNA Extraction and Amplification Protocols
seconds at 49°C and 30 seconds at 72°C followed by 15 cycles of 30 seconds at 94°C, 30 seconds at 67°C and 30 seconds at 72°C, followed by 10 minutes at 72°C. Trimmed and merged sequences were first filtered to a minimum length of 380 basepairs and a maximum expected error (based on phred quality scores) of 1 and then concatenated and dereplicated in USEARCH (Edgar 2010). The resulting sequences formed the final dataset for the community genetic analysis.
To identify taxa in the samples, filtered sequences were decomposed into minimum entropy decomposition (MED) nodes (Eren et al. 2015) using the default settings. Chimeric sequence nodes were identified and removed from results using the UCHIME denovo program (Edgar 2011) with all default settings. Taxonomy was assigned to remaining MED nodes using the blast method within the assign_taxonomy.py script from QIIME (Caporaso et al. 2010) and a reference database of Thalassiosira sequences curated from GenBank (Table 2). MED nodes are equivalent to operational taxonomic units (OTUs), which were used here to assign sequences to one species or to a group of species for cases in which a sequence was indistinguishable between two species. UCHIME denovo was run separately with only mock community samples and then separately with only field samples as expected chimeric sequences are higher for a lower diversity community, as in the mock communities. Nodes that contained fewer than 0.04% of total sequence reads for that sample were removed as possible sequencing error (Quail et al. 2012).

Statistical and Environmental Analyses
Thalassiosira species presence-absence data were obtained from the taxonomy assigned to MED nodes per the 0.04% cutoff noted above and used to assess species composition of the water samples and mock communities. Variation within triplicate extraction and triplicate PCR reactions was assessed using ANOVA. Observed mock community compositions were compared with the expected compositions through χ square goodness-of-fit-tests.
Bray-Curtis similarity was computed between all field samples and between all species identified in the field samples in PRIMER v6.1.6 (PRIMER-E Ltd.). To examine temporal variation of species composition on yearly and monthly time scales, likelihood of species occurrence was calculated by summing species data by year or month, respectively, and normalizing by sampling effort. Bray-Curtis similarity was then calculated for the likelihood of both yearly and monthly species presence. Cluster dendrograms and multidimensional scaling plots (MDS) based on Bray-Curtis similarity were created in PRIMER v6.1.6 (PRIMER-E Ltd.).
Correlations of community composition with environmental conditions were analyzed using environmental data obtained from the Narragansett Bay Long-Term

Thalassiosira in the Narragansett Bay Long-Term Phytoplankton Time Series, December 2008 to December 2014
Based on microscopy records in the long-term dataset, Thalassiosira species were present in 85% of the weekly cell counts of Station II surface water collected between December 2008 and December 2014 ( Figure 2). The cell abundance of the genus ranged from zero observed cells to 1,683,000 cells/liter, with a median abundance of 14,000 cells/liter (Appendix 1). During the same time period, four Thalassiosira species or groups of species were recorded in the long-term data set.
Two were identified to the species level (T. nordenskioeldii, T. punctigera) and two represented multiple species (T. rotula/T. gravida and Thalassiosira spp. . Despite a few samples with very low sequence survival rates, a majority of OTU rarefaction curves and species rarefaction curves terminated with an asymptote, indicating that sampling depth was appropriate for this study (Figure 3).

Reference Database
The ML tree for the 1552 bp 18S gene in Thalassiosira species showed moderate species resolution and bootstrap support ( Figure 4). In contrast, the ML tree for the 420 bp V4 region of the 18S showed weaker support and lower species resolution ( Figure 5). Some species were clearly resolved in both the V4 region and the full length 18S. For example, both the 18S and 18S V4 analyses supported a distinct clade comprised of T. guillardii, T. weissflogii and T. pseudonana. These three species are highly resolved and distinguishable at the 18S and 18S V4 fragments.
Other species were not resolved as clearly in the 18S V4 tree, either due to variable sequences for one species not clustering together or due to multiple species sharing one common sequence. The first pattern is seen for T. anguste-lineata, T. minima, T.
profunda and T. tenera, each of which has sequences which appear on more than one branch in the 18S V4 ML tree. The T. tenera sequences cluster together at the whole 18S, indicating that this is one species. However, for T. anguste-lineata and T.
profunda, the sequences also appear on different branches of the whole 18S tree, meaning that the taxonomies for these species can't be clarified. The second pattern appears when there are no differences at the 18S V4 locus between or among species, meaning that the two species are unable to be resolved at this locus. These unresolved species pairs include T. nordenskioeldii and T. aestivalis, T. pacifica and T. aestivalis, and T. rotula and T. gravida.

Mock Community Results
Sequencing of the mock communities recovered all 9 species that had been added, including all four Thalassiosira species, as well as the other diatom species (D. Among the DNA extraction triplicates for each of the four mock communities, composition did not vary significantly (ANOVA, p>0.05; Figure 6). Composition did not vary significantly among the three amplification triplicates performed on either the mock community sample (mock community 1) or the field sample (12/30/2014) (ANOVA, p>0.05). In contrast, observed mock community percent OTU composition was significantly different than the expected percent species composition used to create the mock communities (χ square goodness-of-fit-test, P<0.001; Figure 6). Since OTU composition did not adequately represent the actual species composition of the mock communities, sequence abundance of each OTU could not be used as an estimate of species abundance. As a result, species composition of the field samples was analyzed using presence/absence data.

Field Sample Sequencing Results
Fifteen Thalassiosira species were identified in the field samples (  (Table 5), making them difficult to identify using light microscopy at 100x.  pseudonana, 2009; T. tenera, 2010; T. guillardii, 2012). Some species were more likely to co-occur than others but these patterns were not especially strong (Figure 8).
At a 45% Bray Curtis similarity, one group of four species emerged: T. guillardii, T. minima, T. profunda and T. pseudonana. These species were all frequent and have small cell sizes (Table 5). T. guillardii and T. minima were the most likely species to cooccur, with a 70% Bray Curtis similarity.
The majority of species were not very frequent ( Figure 8); only two species (T.
guillardii, T. pseudonana) had frequencies above 40%. Thalassiosira guillardii was the most frequent, appearing in 78.75% of field samples, followed by T. pseudonana (. T. minima and T. pseudonana were also quite frequent (61.25% and 47.5% respectively). All other species had frequencies under 30%. The least frequent species were T. tumida and T. eccentrica.

Seasonality of Thalassiosira community composition
To from all twelve months, although its probability of occurrence was slightly higher in the winter. Likewise, T. minima was also observed in samples from all twelve months, with a higher occurrence probability in the winter (100% for December and January during the study period).

Species Specific Environmental Ranges
In addition to variation in seasonal occurrence patterns, species also had variable observed temperature and salinity ranges ( Figure 13) A principal components analysis (PCA) of the environmental factors separated the sample dates roughly by season ( Figure 14). The first PC axis explained 40.1% of the variation and separated samples primarily by DIP, but also by DIN, Si, Chl a, and surface salinity. Higher nutrients and lower chl a were associated with summer and fall samples. The second PC axis explained 25.4% of the variation and differentiated samples by surface temperature and average daily PAR. Higher temperatures and average daily PAR were associated with summer samples, while lower temperatures and PAR were associated with winter samples. Surface temperature was the most highly correlated environmental factor and explained 24.6% of the Thalassiosira community variation through BEST analysis (Table 6).

Morphological Variation of Thalassiosira Cells
A preserved sample was examined to verify species presence when unresolved species OTUs were present in the HTS results. One sample from January 11, 2011 was

chosen when both the T. nordenskioeldii/T. aestivalis and the T. pacifica/T. aestivalis
OTUs were present in the HTS sequence results (Figure 15). Hexagonal shaped cells with depressed valve faces that appear to be T. nordenskioeldii are visible (Figures   14a, 14b). Cells with several closely spaced marginal processes are also visible, which is indicative of T. aestivalis or T. pacifica (Figures 14c, 14d). Identifications are based on Hasle (1978).

Use of the V4 region as a barcoding gene for the genus Thalassiosira
The HTS results uncovered greater diversity than found through traditional light microscopy methods and this research supports using the 18S V4 locus as a barcode gene for the Thalassiosira genus of diatoms. weissflogii and T. guillardii. This is in accordance with a phylogeny constructed from both SSU and partial LSU nuclear DNA by Alverson et al. (2007). Finally, the larger clade found in the ML tree contains all other Thalassiosira species, in accordance with Kaczmarska et al. (2006), Hoppenrath et al. (2007), and Alverson et al. (2007).
Overall, the ML tree constructed using the reference database sequences used here appears to accurately represent the paraphyly of this genus. is not statistically significant, it is greater than triplicate variation seen in mock communities 1 and 2, which is minimal. This variation is too large to be easily explained by SSU rDNA copy number, and a satisfactory alternate explanation is not readily apparent. Other factors such as PCR bias and sequencing bias, or methodological error may be at play. As a result, care is necessary in interpreting SSU rDNA sequence abundance for diatoms, as sequence abundance is not likely to reflect species abundance.
Chimeric sequences were found in the mock community sequences results.
When species were identified which had not been added, it was very easy to pinpoint the issue. If a mock community control did not exist, this problem may have been overlooked in field samples. Chimeric sequences can arise during PCR (Smyth et al.

2010) if the elongation step is aborted while a sequence has only partially been
extended. If this partial sequence misprimes another sequence during a following PCR cycle, the resulting sequence can be partially from one species and partially from another. Therefore, it is valuable to maintain a control when sequencing field samples to help screen for issues such as chimeric sequences and running final OTUs through a chimera filter such as UCHIME is recommended.  (2007) and Harris et al. (1995) but were obtained without using scanning electron microscopy. The 18S V4 barcode can assess diversity within a closely related diatom genus similarly to SEM methods, but less time and expertise is required.
Several of the newly recorded species in Narragansett Bay are small (<15μm) (T. guillardii, T. minima, T. oceanica, T. profunda; Table 5) which may explain why they were not identified using LM at 100x. Hasle (1978) states that T. profunda is not identifiable through LM. However, their presence in an estuary is not unprecedented.
T. guillardii, the most frequent species seen in these results, is very frequent and abundant in a Spanish estuary (Trigueros et al. 2001). T. guillardii is also recorded in several British estuaries (Belcher and Swale 1986). T. oceanica has been identified in Loch Creran in Scotland (Harris et al. 1995). Some of the larger newly recorded species for Narragansett Bay are also observed in other estuaries. T. profunda, T.
minima, and T. tenera are recorded in British estuaries (Belcher and Swale 1986) and T. angulata, and T. concaviuscula have been collected in Loch Creran, Scotland (Harris et al. 1995).
The presence of T. tumida in Narragansett Bay is surprising. It was only present in 1 out of 80 field samples and was the least frequent species in the HTS results. This species is known to only exist in the Southern Ocean (Tomas 1997).
While this could suggest that the range of this species is greater than previously thought, the sample with T. tumida was collected in the summer (6/19/2013) when water temperatures was 20.0°C, well above the recorded tolerance of this species. It is more likely that another Thalassiosira species shares the same 18S V4 sequence as T.
tumida at this locus, similar to the very closely related 18S V4 sequences for T. rotula and T. gravida also in these results. Since 18S V4 sequences are lacking for many Thalassiosira species, it is likely that this OTU was incorrectly assigned here.

Historical data comparisons
Since this study builds upon a long running time series, it is valuable to compare these most recent HTS data with the historical record. Our recent data both include species that were important in the past and exclude some historically recorded species. Three species (T. hyalina, T. kushirensis and T. decipiens) recorded historically in the Narragansett Bay Long-Term Time Series were not detectable using our 18S HTS methods ( Table 4). Two of these species (T. kushirensis and T. decipiens) do not have reference sequences at the 18s V4 locus and were therefore not included in the reference database used in this study. A 333bp partial 18S V4 sequence does exist in GenBank for T. hyalina (JX437383.1), which is shorter than the amplicon used here. If reference sequences for these species are added in GenBank, this data can be tested for their presence. Our results did include sequences which were not assigned taxonomy from the reference database. These unidentified sequences may include additional Thalassiosira species, but also contain other phytoplankton species. The primers used here amplify other diatoms and raphidophytes (as in the mock community results presented here), and Narragansett Bay is home to numerous diatom genera, which were also amplified in the field samples but not analyzed for this study.
Of the historically recorded species that were found in the recent HTS results, T. nordenskioeldii is perhaps the most notable. As the fifth most abundant diatom species from 1959-1980, Karentz and Smayda (1984) 1959-1980 and 1999-2008. It is possible that this trend continued through 2014, and is reflected by the decreased frequency of T. nordenskioeldii.
Other species with historic records in Narragansett Bay are T. rotula, T.
pseudonana, and Thalassiosira sp. T. rotula was the 18 th most abundant phytoplankton species and 3rd most abundant Thalassiosira species from 1959-1980 (Karentz and Smayda 1984). From 2008-2014, it was still very common and the T. rotula/T. gravida OTU appeared annually in the HTS results and was more likely to occur in the spring and summer. T. rotula historically displayed a yearly maximum in the fall. T. gravida was less abundant than T. rotula in the 1959-1980 time series records (36 th most abundant phytoplankton; 5 th most abundant Thalassiosira) but no seasonality information is given in Karentz and Smayda (1984). The LM counts from and a dominant spring taxon in Chesapeake Bay (Marshall et al. 2006). Our results combined with these more recent observations suggest that this species could be an important or at least a more frequent contributor to estuarine phytoplankton communities than currently recognized.
The final historic Thalassiosira record of note is Thalassiosira sp., the sixth most abundant phytoplankton species (2 nd most abundant Thalassiosira) from 1959-1980 (Karentz and Smayda 1984). This species was characterized as a small (~10μm in diameter), unidentified Thalassiosira species that appeared in Narragansett Bay beginning in 1967 and became both frequent and common. Our HTS results identified several small, very frequent Thalassiosira species that had not been previously identified in the Phytoplankton Time Series, including Thalassiosira guillardii (4-14μm) and Thalassiosira minima (5-15μm), either of which may be the species discussed historically.

Factors affecting species composition
Seasonal patterns are apparent when looking at individual species (as in the discussions of T. nordenskioeldii and T. rotula/T. gravida above) but as whole, seasonality does not fully explain Thalassiosira community composition ( Figure 10). Skeletonema blooms (Borkman 2009, Smayda andBorkman 2009). These regional ocean-atmospheric patterns may also impact other diatoms, including Thalassiosira species. On a much smaller scale, microzooplankton grazers consume an average of 96% of primary production in Narragansett Bay (Lawrence and Menden-Deuer 2012).
Grazers also preferentially select particular prey which in turn can structure the phytoplankton community. Microzooplankton in Chesapeake Bay selectively graze on picophytoplankton (<2 μm) (Sun et al. 2007). The species in this study range in size from 1.8μm -186μm, and may be under differential grazing pressure. Another noteworthy annual grazer in Narragansett Bay is Mnemiopsis leidyi. M. leidyi exerts significant control on the plankton community in Narragansett Bay. In four years out of a six year study (1972)(1973)(1974)(1975)(1976)(1977), a summer M. leidyi bloom was correlated with rapid declines in both the zooplankton and phytoplankton communities and variations in the summer abundance of diatom Skeletonema sp. correlated with the size of the M. leidyi pulse (Deason and Smayda 1981). M. leidyi blooms may be occurring earlier in the spring and increasing in abundance as water temperatures rise with climate change (Sullivan et al. 2001), which may impact Thalassiosira species.
Other loss factors affecting diatom communities are cell lysis, sinking and advection (as reviewed in Sarthou et al. 2004). These three factors may be structuring the Thalassiosira population in Narragansett Bay as these three processes can differentially affect species based on factors including size, density, and morphology (Smayda 1969, Veldhuis et al. 2001, Miklasz and Denny 2010. Finally, only one diatom genus was examined here while at least 30 diatom genera have been observed in the Long-Term Phytoplankton Time Series, along with at least 10 dinoflagellate genera and 10 flagellate genera (Windecker 2010). The additional species that make up the entire phytoplankton community could affect the Thalassiosira community structure through resource competition. Alternately, diatoms may co-occur through resource partitioning. For example, metatranscriptome data from diatoms in Narragansett Bay revealed that Skeletonema sp., the most abundant diatom in the bay, and T. rotula utilize different nitrogen and phosphate metabolic pathways, perhaps allowing these species to partition resources (Alexander et al. 2015). These complex environment-species-species interactions were not examined in this study, but clearly are a part of the Narragansett Bay ecology. Future work on community wide processes in Narragansett Bay may provide insight into additional controls on Thalassiosira populations.

Conclusions
The high throughput sequencing approach using the 18s V4 locus is a powerful tool for exploring diversity and seasonal occurrence patterns for diatom genera such as Thalassiosira that harbor great species diversity and are morphologically similar.
Thalassiosira species diversity in Narragansett Bay is greater than previously recognized. Much of this newly found diversity can be attributed to several smaller sized species that are very frequent and have been overlooked using light microscopy methods. Thalassiosira species have a rough seasonal pattern grouped by winterspring, summer and fall, and only 25% of the Thalassiosira species composition is explained by the physical and chemical environmental factors in this study. Other factors, such as large scale climactic processes, predation effects, sinking rates, autolysis and community competition, may be contributing to Thalassiosira species composition in Narragansett Bay. In comparison to historical data, the frequency of important bloom species such as T. nordenskioeldii may be in decline, while T. rotula timing may have shifted.      through LM counts and recently (2008-2014) using light microscopy (LM) and high throughput sequencing (HTS). X = identified; * = indistinguishable from other species by this method; -= unable to be detected (sequence not available in reference database); blank = not detected by these methods. Note: T. tumida was recovered in sequence results but this record is questionable and not included in the total species count for Narragansett Bay. See discussion on page 27. Cold to temperate waters (Hoppenrath et al. 2007) T. anguste-lineata 14-78 Cosmopolitan T. concaviuscula 14-56 (Hoppenrath et al. 2007) Neritic, cold to temperate waters (Hoppenrath et al. 2007) T. eccentrica 15-110 Cosmopolitan (exclusive polar zones)

TABLES
T. guillardii 4-14 (Hasle 1978) Inland and marine/brackish coastal waters (Hasle 1978) T. minima 5-15 Cosmopolitan (exclusive polar zones) T. nordenskioeldii 10-50 Cold to temperate, northern hemisphere T. oceanica 3-12 Mainly warm water region T. pacifica 7-46 Cosmopolitan (exclusive polar zones) T. profunda 1.8-5 (Belcher and Swale 1986) Cosmopolitan (Hasle 1978) Figure A gives all OTUs generated in this study whereas Figure B is only showing only Thalassiosira species richness. The difference in figures is due to multiple OTUs corresponding to one species and the fact that the primers used in the study amplified all diatoms and raphidophytes in the samples, not just Thalassiosira species.  Figure 6: Expected and high-throughput sequence results for all four mock community experiments are shown as % composition. Expected community composition from cell counts of each species is given in the leftmost column of each graph. Triplicate extraction sequence results for each mock community are shown as A, B and C in each graph. Observed mock community sequence data is significantly different than the expected composition (χ square goodness-of-fit test, p<0.001). Within all four of the triplicate mock communities (A, B, C), composition did not vary significantly (ANOVA, p>0.05).    Figure 12: Occurrence frequency by month is presented for eight most frequent taxa in the HTS results as labeled in each panel. Number of years each individual taxa was present in each month was summed and divided by number of years sampled over the 6 year study period. All months were sampled 7 times each, with the exception of February, March and July, which were each sample 6 times each. Figure 13: Observed temperature ranges (top panel) and salinity ranges (bottom panel) for fourteen of the Thalassiosira species recovered from the HTS data. The range for T. tenera is omitted since that species was only observed in one sample and therefore, does not have enough data for a range. Species names are abbreviated as follows: T. nord./aest. = T. nordenskioeldii/T. aestivalis OTU; T. pac./aest. = T. pacifica/T. aestivalis OTU; T. rot/T. grav. = T. rotula/T. gravida OTU. Figure 14: Principal components analysis of environmental data associated with each sample date for this study. Environmental variables included are surface temperature (°C), surface salinity (ppt), surface chlorophyll a (μ/L), average daily PAR (mmol/m2), DIN (μM), DIP (μM), and Si (μM). Values for salinity, chlorophyll a, DIN, DIP, and Si were log transformed. Colors indicate the season of each sample (winter is January through April, spring is May through June, summer is July through September, fall is October through December). Surface temperature and average daily PAR contribute to PC2, separating winter samples from summer samples. PC1 explains 40.1% of the variance in these environmental parameters while PC1 and PC2 together explain 65.4% of the variance in the environmental parameters plotted here. Valve faces in girdle view cells may be concave, as in T. nordenskioeldii. C and D. A single central process is visible in these valve view cells, which appears in T. aestivalis, T. nordenskioeldii and T. pacifica. Several marginal processes are seen around the diameter of all cells. Closer marginal process spacing, as in the right cell in image C, is indicative of T. aestivalis or T. pacifica.

APPENDICES
Appendix 1: Dates for 80 field sampling dates in Narragansett Bay from December 2008 through December 2014 are given with the total phytoplankton abundance and Thalassiosira genus abundance for that day as quantified by light microscopy cell counts. Blanks in abundance columns indicate dates that were sequenced but for which no abundance data is available. Monthly samples were chosen for 2008-2013, with the exceptions of July 2011 and February-April 2012, when samples were not collected. Bi-weekly samples were sequenced for 2014.

Date
Total Phytoplankton Abundance (cells/liter) Thalassiosira Abundance (cells/liter) Appendix 2: Dates for 80 field sampling dates in Narragansett Bay from December 2008 through December 2014 are given with the Thalassiosira species abundance for that day quantified by light microscopy cell counts. Blanks in abundance columns indicate dates that were sequenced but for which no abundance data is available. Monthly samples were chosen for 2008-2013, with the exceptions of July 2011 and February-April 2012, when samples were not collected. Bi-weekly samples were sequenced for 2014.