Adaptive divergence, neutral panmixia, and algal symbiont population structure in the temperate coral Astrangia poculata along the Mid-Atlantic United States

Astrangia poculata is a temperate scleractinian coral that exists in facultative symbiosis with the dinoflagellate alga Breviolum psygmophilum across a range spanning the Gulf of Mexico to Cape Cod, Massachusetts. Our previous work on metabolic thermal performance of Virginia (VA) and Rhode Island (RI) populations of A. poculata revealed physiological signatures of cold (RI) and warm (VA) adaptation of these populations to their respective local thermal environments. Here, we used whole-transcriptome sequencing (mRNA-Seq) to evaluate genetic differences and identify potential loci involved in the adaptive signature of VA and RI populations. Sequencing data from 40 A. poculata individuals, including 10 colonies from each population and symbiotic state (VA-white, VA-brown, RI-white, and RI-brown), yielded a total of 1,808 host-associated and 59 algal symbiont-associated single nucleotide polymorphisms (SNPs) post filtration. Fst outlier analysis identified 66 putative high outlier SNPs in the coral host and 4 in the algal symbiont. Differentiation of VA and RI populations in the coral host was driven by putatively adaptive loci, not neutral divergence (Fst = 0.16, p = 0.001 and Fst = 0.002, p = 0.269 for outlier and neutral SNPs respectively). In contrast, we found evidence of neutral population differentiation in B. psygmophilum (Fst = 0.093, p = 0.001). Several putatively adaptive host loci occur on genes previously associated with the coral stress response. In the symbiont, three of four putatively adaptive loci are associated with photosystem proteins. The opposing pattern of neutral differentiation in B. psygmophilum, but not the A. poculata host, reflects the contrasting dynamics of coral host and algal symbiont population connectivity, dispersal, and gene by environment interactions.

160 (i.e. putative genets) were fragmented into three pieces, so that a ramet from each genet was 161 represented in each of the three temperature treatments (cold = 14°C, control = 18°C, and heat = 162 22°C). Following fragmentation, all corals were allowed to recover at the holding conditions 163 (18°C and 35 ppt) for 20 days before the experiment began. While this experiment was designed 164 to consider differential gene expression in A. poculata across temperatures and populations; here, 165 we present sequencing data from this same experiment and resulting single nucleotide 166 polymorphisms (SNPs) to consider population structure.

2.2 Temperature environment of coral populations
168 To compare long-term temperature trends across the two collection sites, sea surface 169 temperature (SST) data was downloaded from the NOAA 1/4° daily Optimum Interpolation Sea  Detailed descriptions for all data analyses can be found on the electronic notebook 256 associated with this publication (github.com/hannahaichelman/Astrangia_PopGen). Raw 257 sequences were processed using the adapter trimming/quality filtering functions of the Fastx-258 Toolkit to remove adapter sequence contamination and reads with a quality score less than 33.
259 All sequences were then used as input for de novo transcriptome assembly using Trinity (version 262 and small subunit (SSU) databases (http://www.arb-silva.de/). "Good hits" to these rRNA 263 databases were defined as matching at least 78% of the read over at least 100 bp, and once 264 identified were removed from the reference assembly. A total of 1,273 matches to the LSU 265 database and 688 matches to the SSU database were removed from the reference assembly.

266
Once rRNA contamination was removed from the reference assembly, it was filtered to 267 include only sequences greater than 500 bp in length. Host and symbiont contigs in the reference 268 assembly were differentiated and assigned as described previously ( Table S1). A contig was considered a host 275 contig if it had a length overlap greater than 100 bp with a 60% identity cutoff to any cnidarian.
276 If the same contig was also assigned to a cultured (clean) symbiont read with the same length 277 and cutoff identity, it was removed from the host contig list. Similarly, a contig was considered a 278 symbiont contig if it had a length overlap greater than 100 bp with a 60% identity cutoff to any 279 symbiont, and removed if it also assigned to a clean coral reference. Contigs identified as both 280 coral and symbiont were also removed from the reference.

281
Once contigs were designated as host or symbiont, the resulting Trinity-assembled  (Table 1).

289
To maximize the total reads per genotype for the population genetic analyses presented 290 here, fastq files were combined by genotype. This reduced 84 libraries (7 genotypes x 4 291 populations x 3 temperature treatments) to 28 fastq files. These concatenated files, along with the 292 12 libraries originally intended for population genetic analyses, yielded a total of 40 libraries that 293 were used for all downstream analyses. It should be noted that the dataset presented here had 294 relatively low read counts, resulting from an error in the library preparation that led to 295 sequencing of rRNA in addition to mRNA. The primary result of this error was a larger than 296 usual percentage of sequence yield going to rRNA, which resulted in a lower read count on an 297 individual basis. To combat this, we have used conservative cut-off values at every step of the 298 data analysis pipeline to address these issues and account for missing data.

299
The quality filtered reads were mapped to the de novo holobiont transcriptome using the 306 2014b) to create a rigorously filtered set of variant sites. The filtering parameters were the same 307 for the host and symbiont SNPs, and each file was filtered in four steps. First, vcftools was used 308 to filter files to exclude individuals with more than 50% missing data, exclude sites with minor 309 allele count (mac) greater than or equal to 3, only include sites with a quality score above 30, 310 exclude genotypes with fewer than 5 reads, include only bi-allelic sites, and remove indels.
311 Second, the filter_missing_ind.sh script from dDocent was used to remove individuals with more 312 than 85% missing data. Third, another round of vcftools filtering was conducted to exclude sites 313 if they had more than 75% missing data, include sites with a minor allele frequency (maf) greater  Table 2.  The total counts of A. poculata mapped reads/genet ranged from 9,340,966 to 380 167,484,549, and mapping efficiencies ranged from 45.96% to 54.10% (Table 3)  Manuscript to be reviewed 411 separating VA and RI individuals at high outlier loci ( Figure S4). The optimal K value for 412 both neutral and outlier loci in the coral host was two.  Coral host contigs that contained more than one high outlier SNP (n = 13) were manually 426 inspected to look for agreement with previous studies on coral adaptation to distinct temperature 427 environments (Table S3)

436
There were four high outlier SNPs in symbiont reads, three of which were annotated.
437 These SNPs are on genes annotated as photosystem II (PSII) CP43 reaction center protein, 438 photosystem I (PSI) P700 chlorophyll a apoprotein A2, and PSII protein D1. A full summary of 439 these contigs with multiple high outlier SNPs can be found in Table S3.

440
When analyzing the data as two populations separately for VA and RI to look for 441 consistent high outlier SNPs that could be driving differentiation between brown and white 442 morphs, 11 high outlier SNPs were identified as shared between VA and RI (Table S3)