Development of microsatellite markers for the Japanese endemic conifer Thuja standishii and transfer to other East Asian species

Design polymorphic microsatellite loci that will be useful for studies of the genetic diversity, gene-flow and reproduction in the Japanese endemic conifer Thuja standishii and test the transferability of these loci to the two other East Asian species, T. sutchuenensis and T. koraiensis. Fifteen loci were developed which displayed 3 to 21 alleles per locus (average = 9.2) among 97 samples from three populations of T. standishii. Observed heterozygosity for all samples varied between 0.33 and 0.75 (average = 0.54) while expected heterozygosity values were higher with an average over the 15 loci of 0.62 (0.37–0.91). Low multi-locus probability of identity values (< 0.00002) indicate that these markers will be effective for identifying individuals derived from clonal reproduction. All 15 loci amplified in 13 samples of T. sutchuenensis, the sister species of T. standishii, with 1 to 11 alleles per locus (average = 4.33) while 13 loci amplified in four samples of the more distantly related T. koraiensis with 1 to 5 alleles per locus (average = 2.15).


Introduction
Thuja is a small Cupressaceae genus consisting of five extant species with three species in East Asia and two in North America [6]. Thuja standishii (Gordon) Carriere (kurobe or nezuko in Japanese and Japanese arbor-vitae in English) is endemic to Japan where it has a scattered distribution from 40.67˚N to 33.49˚N on the islands of Honshu, Shikoku and Dogo (Shimane Prefecture). The species is most commonly a component of subalpine forests but also occurs in a variety of habitats including warm temperate forest (rarely), cool temperate forest, moorland and near the alpine zone. The species can grow as a single-stemmed tree up to 35 m tall but occurs as a multi-stemmed shrub under 1 m tall at the maximum limit of its elevational range [15]. It can purportedly reach a great age with trunks up to 3.5 m in diameter (Giant Tree Database, Biodiversity Center of Japan (http://kyoju .biodi c.go.jp)) and individuals over 1000 years old.
Unlike most other Cupressaceae conifers of Japan, T. standishii has received little research attention with basic information on its conservation (including the impact of past logging), reproductive biology and genetic diversity, lacking. The rarity of forest dominated by T. standishii and its current insignificant role in forestry probably underlies this lack of research. However, the species is undoubtedly an important part of Japan's biodiversity and cultural heritage. For example, it is one of Japan's five precious timber trees (Kiso go-boku) that from the early 18th century were strictly protected from cutting in the Kiso region of central Honshu [8] and, in some parts of Japan, forests containing T. standishii are considered to represent the most untouched forests remaining in the landscape (e.g. [13]. Small population size and geographic isolation, its vulnerability to ring barking by deer  [15] has resulted in some populations being of conservation concern, especially in western Japan where the species is very rare [15]. One key aspect of the species biology that is poorly understood is the role of asexual reproduction in its regeneration. However, similarly to two other Japanese Cupressaceae conifers, Thujopsis dolobrata and C. pisifera that have been proven to regenerate clonally [3,4], T. standishii also forms dense understory banks of juveniles (Worth personal observation) which may be clonally derived.
In the current age of genomics, microsatellites retain a vital role in biology due to their high information content, utility in a wide range of genetic applications and cost-efficient nature [5]. This study describes the development of Expressed Sequence Tagged (EST) nuclear microsatellite markers for T. standishii using next generation sequencing. These markers will be useful molecular tools to inform the conservation of this species via studies of its range-wide genetic diversity, gene flow and reproductive biology. In addition, the transferability of the markers was tested in the two other East Asian species for which no microsatellite markers have yet been developed including the Chinese endemic Thuja sutchuenensis, the sister species of T. standishii [7,10], and T. koraiensis.

Materials and methods
Total RNA was extracted from an individual of T. standishii collected from the Forestry and Forest Products Research Institute Arboretum using a plant RNA isolation mini kit (Agilent Technologies, USA). An RNAseq data set was constructed by the Beijing Genomics Institute on an Illumina HiSeq4000 platform. The T. standishii RNA-seq data consisted of 38,076,160 pairedend reads of 100 bp length. De novo assembly was undertaken in CLC Genomics Workbench 8.5.1 and the 53,614 resultant contigs (N50 = 1503 bp) were mined for microsatellite regions. Primers were developed bordering these regions with default settings using PrimerPro (http://webdo cs.cs.ualbe rta.ca/∼yifen g/prime rpro/). Microsatellites were selected if the number of tandem repeat units was greater than eight and if the microsatellite was located less than 25 bp from the beginning or end of the contig. These criteria resulted in 64 microsatellite primer pairs which were trialled for amplification in four samples. A total of 36 primer pairs successfully amplified and were subsequently tested for size heterogeneity in eight samples representative of the species range. For all loci, the forward primer was synthesized with one of three different M13 sequences (5′-GCC TCC CTC GCG CCA-3′, 5′-GCC TTG CCA GCC CGC-3′, and 5′-CAG GAC CAG GCT ACC GTG -3′), and the reverse was tagged with a pig-tail (5′-GTT TCT T-3′; [2]). The PCR reactions were performed following the standard protocol of the Qiagen Multiplex PCR Kit (Qiagen, Hilden, Germany), and consisted of a 10 uL reaction volume, containing approximately 5 ng of DNA, 5 uL of 2× Multiplex PCR Master Mix, and 0.06 uM of forward primer, 0.1 uM of reverse primer, and 0.08 uM of fluorescently labelled M13 primer. The PCR thermocycle consisted of an initial denaturation at 95 °C for 3 min; followed by 35 cycles of 95 °C for 30 s, 60 °C for 3 min, 68 °C for 1 min; and a 20 min extension at 68 °C. The PCR products were separated by capillary electrophoresis on an ABI3130 Genetic Analyzer (Life Technologies, Waltham, MA, USA) with the GeneScan 600 LIZ Size Standard (Life Technologies) and genotyping was done in GeneMarker (SoftGenetics, LLC, PA, USA). Overall, 15 loci were found to amplify reliably, display polymorphism and were readily scorable.  Table S1). Genetic analyses were undertaken in GenAlEx 6.5 [9] and Genepop 4.2 [11]. In addition, a similarity search of the contigs containing the 15 loci was conducted by the BLASTX algorithm [1] against the National Center for Biotechnology Information (NCBI) non-redundant protein sequences (nr) database.
The multi locus probability of identity (PID) for the 15 markers, that is, the probability that two individuals drawn at random from a population will have the same genotype [14], was calculated in Gimlet version 1.3.3 [12] using all 97 samples from the three population of T. standishii. Three PID estimates outlined by Waits et al. [14] were estimated: biased PID, which assumes individuals mate randomly; unbiased PID, which corrects for sampling a small number of individuals and, sibs PID, which assumes the population is composed of siblings.

Results
In T. standishii, the 15 loci (Table 1)  Multi-locus probability of identity values were below the threshold value of 0.01 considered by Waits et al. [14] to be required to reliably distinguish between individual genotypes, even under the sibs PID (Additional file 1: Table S2). This indicates that our markers will be effective for both identifying individuals derived from clonal reproduction and sexually derived individuals in populations even where inbreeding is prevalent.
All 15 loci amplified in T. sutchuenensis (two loci being monomorphic) with 1 to 11 alleles per locus (average = 4.33) while average observed and expected heterozygosity were 0.43 and 0.48, respectively (Table 3). On the other hand, only 13 loci amplified in T. koraiensis with three loci being monomorphic. In this species, 1 to 5 alleles per locus (average = 2.15) were found with average observed and expected heterozygosity of 0.44 and 0.31, respectively (Table 3).

Discussion
The development of EST microsatellites for Thuja standishii will enable new genetic research into this important Japanese endemic conifer including studies of range-wide level genetic diversity and gene flow and also stand-level processes such as inbreeding and clonality. The development of molecular markers may help to foster research into this species, which because of its wide ecological range, from warm temperate forests to near the alpine zone, is an ideal species to investigate  ecological and genetic processes under strongly contrasting climates. The transferability of the 15 loci was consistent with the phylogenetic relationships of the East Asian Thuja [7,10]. Thus, all 15 loci successfully amplified in the sister species of T. standishii, T. sutchuenensis, and displayed considerable allelic diversity with up to 11 alleles per locus. These loci, therefore, may be particularly applicable for use in genetic studies of this geographically restricted endangered species [16]. In contrast, two of the fifteen loci did not amplify in the more distantly related T. koraiensis and the number of alleles per locus (ranging  from 1 to 5 alleles) was low although this low allelic diversity is likely to also be due to the low number of samples tested.

Limitations
• The number of published microsatellite markers may be too low for optimal performance of some genetic analyses. • These microsatellite loci have not been tested in the two North American species, T. plicata and T. occidentalis. • We did not afford much time optimizing loci, therefore some polymorphic loci that may have worked with further effort may have been excluded.
Additional file 1: Table S1. Sampling information for T. sutchuenensis and T. koraiensis. Apart from two samples from Halla Arboretum on Jeju Island, all samples were derived from natural populations. Table S2. The probability of identity (PID) values for each of the 15 polymorphic loci per locus and the multi-locus values.