Characterisation of a novel panel of polymorphic microsatellite loci for the liver fluke, Fasciola hepatica, using a next generation sequencing approach

The liver ﬂuke, Fasciola hepatica is an economically important pathogen of sheep and cattle and has been described by the WHO as a re-emerging zoonosis. Control is heavily reliant on the use of drugs, particularly triclabendazole and as a result resistance has now emerged. The population structure of F . hepatica is not well known, yet it can impact on host–parasite interactions and parasite control with drugs, particularly regarding the spread of triclabendazole resistance. We have identiﬁed 2448 potential microsatellites from 83 Mb of F . hepatica genome sequence using msatﬁnder. Thirty-ﬁve loci were developed and optimised for microsatellite PCR, resulting in a panel of 15 polymorphic loci, with a range of three to 15 alleles. This panel was validated on genomic DNA from 46 adult F . hepatica ; 38 liver ﬂukes sourced from a Northwest abattoir, UK and 8 liver ﬂukes from an established isolate (Shrewsbury; Ridgeway Research). Evidence for null alleles was found at four loci (Fh_1, Fh_8, Fh_13 and Fh_14), which showed markedly higher levels of homozygosity than the remaining 11 loci. Of the 38 liver ﬂukes isolated from cattle livers ( n = 10) at the abattoir, 37 genotypes were identiﬁed. Using a multiplex approach all 15 loci could be ampliﬁed from several life cycle stages that typically yield low amounts of DNA, including metacercariae, the infective life cycle stage present on pasture, highlighting the utility of this multiplex microsatellite panel. This study reports the largest panel of microsatellite markers available to date for population studies of F . hepatica and the ﬁrst multiplex panel of microsatellite markers that can be used for several life cycle stages. (cid:2) 2015 The Authors. Published by Elsevier B.V. ThisisanopenaccessarticleundertheCCBYlicense(http:// creativecommons.org/licenses/by/4.0/).


Introduction
The liver fluke, Fasciola hepatica, is a zoonotic trematode parasite of grazing animals and humans. As the causative agent of fasciolosis, it results in significant global economic losses to the livestock industry and is considered a re-emerging human disease, currently included within the WHO top 17 listed neglected tropical diseases (Mas-Coma et al., 2009;WHO, 2006WHO, , 2007WHO, , 2012. Over the past decade the prevalence of F. hepatica infection has risen, in part due to climate change, increased animal movement and changing farming practices (van Dijk et al., 2010). Effective control and treatment of fasciolosis in both humans and livestock is reliant on anthelminitics, particularly the benzimidazole, triclabendazole (TCBZ); the drug of choice, given its superior efficacy against both the immature and mature parasites. However, reliance on TCBZ has led to widespread resistance (Overend and Bowen, 1995;Fairweather, 2005;Brennan et al., 2007).
Fasciola has a complex life cycle that involves an intermediate host, the mud snail Galba truncatula and multiple definitive hosts ( Supplementary Fig. 1). The ability of F. hepatica to infect a number of different mammalian hosts is considered to be one of the factors responsible for its extensive geographical reach Mas-Coma et al., 2005;Robinson and Dalton, 2009;Furst et al., 2012). Eggs are passed onto the pasture in faeces from the definitive host. the metacercariae, encyst on herbage prior to being ingested by the definitive host. Juvenile flukes migrate to the bile ducts, where they reach sexual maturity. In the mammalian host, the adult parasites undergo a complex reproductive strategy, with the potential for self and cross fertilisation and parthenogenesis.
The asexual multiplication of F. hepatica within the snail host and complex reproductive biology in the mammalian host facilitates high gene flow and has the potential to promote high levels of genetic variability within F. hepatica populations. Alternatively, multiple permissive hosts may lead to host selection of F. hepatica populations resulting in population sub-structuring and/or differential genetic diversity in different hosts and geographical locations. How population structure may affect the spread of TCBZ resistance throughout a population has yet to be investigated. To fully understand the impact and spread of TCBZ resistance or any genetic component that provides a survival advantage, the population structure of F. hepatica needs to be investigated, using a large panel of polymorphic markers that are neutrally inherited. This information is needed to determine how population dynamics impact on host-parasite interactions and on parasite control interventions.
Studies of the genetic structure of closely related trematodes, including Clonorchis sinensis, Opisthorchis viverrini and Schistosoma spp. have relied on a variety of molecular tools, in particular panels of microsatellite markers, to elucidate population genetic structure (Gower et al., 2007(Gower et al., , 2011Agola et al., 2009;Valentim et al., 2009;Laoprom et al., 2010;Xiao et al., 2011;Glenn et al., 2013;Steinauer et al., 2010). The majority of studies investigating the genetic structure of F. hepatica populations involve the application of mitochondrial markers and/or small panels of microsatellites (reviewed by Hodgkinson et al., 2013). To date only five polymorphic microsatellites have been developed and applied to Fasciola populations in Spain and Eqypt (Hurtrez-Boussès et al., 2004;Dar et al., 2011Dar et al., , 2013Vilas et al., 2012). With the advancement of next generation technologies, trematode genome datasets are now being mined for microsatellite markers to develop panels of polymorphic markers; for example Schistosoma japonicum (Xiao et al., 2011) and more recently Fascioloides magna (Minarik et al., 2014). Here, we demonstrate how 83 Mb of 454 FLX F. hepatica genome sequence data was used to generate a resource of >2000 potential microsatellites, with suitable flanking regions for primer design. Thirty-five loci were selected for preliminary screening, resulting in the development and characterisation of a panel of 15 polymorphic markers. Genomic organisation of the panel of 15 microsatellite loci was investigated by mining our recently generated 1.3 Gb F. hepatica draft genome. Importantly, following an initial validation of these markers their utility was demonstrated for multiple F. hepatica life cycle stages that typically yield low amounts of DNA, particularly eggs, miracidia and metacercariae stages; greatly facilitating population-level analyses across multiple geographical locations in future.

F. hepatica isolates
For development and validation of the microsatellite panel, 46 adult F. hepatica parasites were obtained from two sources: (1) 38 adult liver flukes were recovered from livers of 10 naturally infected cattle at an abattoir, in Northwest England, ranging between one and seven adult liver flukes from each liver; (2) eight adult liver flukes were tested from two cattle experimentally infected with the Shrewsbury isolate (Ridgeway Research), with one and seven adult liver flukes from each animal, respectively.
Each adult liver fluke was genotyped individually. Eggs were harvested from adult F. hepatica parasites obtained from experimentally infected animals, following incubation in 1-2 ml of Dulbecco's Modified Eagle's Media (DMEM; Sigma Aldrich) for a minimum of 2 h at 37°C. A subset of eggs was embryonated by incubation at 26°C for 12 days. Following embryonation the eggs were hatched to release miracidia using standard F. hepatica protocols. Metacercariae were shed from experimentally infected colonies of G. truncatula maintained at the University of Liverpool.

DNA extraction and quantification
Genomic DNA extraction was carried out using the DNeasy Blood and Tissue Kit (Qiagen) for eggs, miracidia, metacercariae and adults or Genomic tip 100G kit (Qiagen; adult parasites only), according to the manufacturer's instructions. Fifty eggs/miracidia/ metacerariae were used for each extraction and DNA was eluted into 60 ll of AE buffer (DNeasy Blood and Tissue Kit; Qiagen). Whole adult liver flukes or a 20 mg section of adult liver fluke near the anterior sucker was used for extraction, followed by an ethanol precipitation and re-suspension of the DNA in 60 ll TE (pH 8.0).
Chelex DNA extractions on single miracidia were carried out according to Valentim et al. (2009), followed by whole genome amplification using the illustra GenomiPhi V2 DNA Amplification Kit in a reaction volume of 50 ll, according to the manufacturer's instructions (GE Healthcare Life Sciences). All DNA concentrations were calculated using the Quant-iT™ PicoGreen Ò dsDNA assay kit (Life Technologies).

Genome sequencing and microsatellite identification
F. hepatica genome sequence data was generated from genomic DNA extracted from a single adult parasite (Sligo isolate; DNeasy Blood and Tissue Kit, Qiagen), using 454 FLX sequencing technology (Roche; Centre for Genomic Research, University of Liverpool). The resulting 450 Mb of raw data was assembled into 190,089 contigs, representing 83 Mb of sequence data (ENA accession number PRJEB8349), which was mined for microsatellites using msatfinder (www.genomics.ceh.ac.uk/msatfinder; default thresholds; Thurston and Field, 2005). More than 2000 potential microsatellites were identified comprising of di-to hexa-nucleotide repeats with suitable flanking regions for primer design (Supplementary Table 1). The tri-and tetra-nucleotide motifs were selected for further analysis, as they are known to be less prone to enzyme slippage during PCR, which can make allele designation difficult (Edwards et al., 1991;Guichoux et al., 2011). Compound microsatellites were excluded and the remaining microsatellites were sorted according to the number of tandem repeats as per Xiao et al. (2011). Thirty-five loci were selected for preliminary PCR screening, which resulted in 15 loci being taken forward for PCR amplification using a modified semi-nested PCR protocol based on Schuelke's method (Schuelke, 2000). Genomic organisation of the panel of 15 microsatellite loci was investigated by mining the recently generated 1.3 Gb F. hepatica draft genome sequenced from the Shrewsbury isolate (ENA accession numbers LN627018-LN647175: assembly data and PRJEB6687: genomic read data; Cwiklinski et al., in press), using BLAST (NCBI) and manual curation.

Microsatellite PCR and capillary sequencing
Primary and secondary amplification from adult genomic DNA was carried out in a 10 ll reaction volume, containing 5 ll of 2Â BioMix Red master mix (Bioline). The primary PCR contained 1 ll of adult DNA template ($10 ng) and 0.5 lM of both the forward M13-tagged primer and the reverse primer (Table 1; Eurofins MWG Operon). The cycling conditions were an initial 10 min at 94°C, followed by 25 cycles (see Table 1) at 94°C for 30 s, annealing temperature (see Table 1) for 45 s and 72°C for 45 s, with a final extension at 72°C for 10 min. The secondary PCR contained 1 ll of the primary PCR, 1 lM of the fluorescent dye-M13 tagged primer (5 0 6-FAM/NED/PET/VIC -TGTAAAACGACGGCCAGT 3 0 ; Life Technologies) and 0.5 lM of the reverse primer (the same used in the primary PCR). The cycling conditions were as above for 10 cycles with an annealing temperature of 53°C. To take advantage of multiplex PCR protocols that offer distinct advantages when low quantities of DNA are available, the 15 microsatellite markers were subsequently validated using a multiplex PCR approach. Four multiplex reactions were optimised using the Type-it Microsatellite PCR kit (Qiagen) according to the manufacturer's instructions (30 cycles; Supplementary Table 2), using 1 ll of adult genome DNA template. Microsatellite amplification for the egg, miracidia and metacercariae life cycle stages was carried out using 3 ll of the DNA template; with an increased number of PCR cycles for both the semi-nested PCR (see Table 1) and multiplex PCR approaches (40 cycles; Supplementary Table 2). PCR products were analysed by agarose gel electrophoresis (2% gel) using SYBR Ò Safe DNA stain (Life Technologies). If a positive result was obtained by gel electrophoresis, the PCR products were diluted 50-fold and 1 ll of this dilution multiplexed in Hi-Di Formamide (8.8 ll; Life Technologies) with GeneScan LIZ500 size standards (0.2 ll; Life Technologies), prior to sequencing using a 3100 Genetic Analyzer capillary electrophoresis system (Life Technologies).

Data analysis
Sequence size was determined with an ABI PRISM 3100 DNA analyzer and Peak Scanner v1.0 software. Analysis was carried out using GENEPOP version 4.0.10 (Raymond and Rousset, 1995;Rousset, 2008), Genetix (Belkhir et al., 2004) and Micro-Checker (van Oosterhout et al., 2004). Each of the markers produced clear, unambiguous traces for individual parasites as expected for single locus markers in a diploid organism.

Validation of F. hepatica microsatellite panel
The 83 Mb F. hepatica genome sequence data, comprised of 190,089 contigs was mined for microsatellite markers resulting in 5222 potential microsatellite loci; 2448 of which were identified with suitable primer flanking regions. To investigate the utility of these microsatellites as a tool for population level studies a subset of 15 loci were validated using 46 individual adult parasites from cattle; 38 liver flukes sourced from naturally infected cattle and eight liver flukes sourced from experimentally infected cattle. All 15 loci studied were found to be polymorphic, with allele numbers ranging from three-15 (Table 2; Supplementary Table 3). For the 38 liver flukes isolated from naturally infected cattle, 37 multi-locus genotypes (MLG) were reported. Hardy-Weinberg equilibrium (HWE) and the heterozygote deficiency for each locus were examined using the exact test, using GENEPOP version 4.0.10 (Table 2). Three loci (Fh_1, Fh_8 and Fh_14) were found to significantly Table 1 Primer sequences and properties of F. hepatica (Fh) microsatellite alleles for the modified Schuelke's method.

Locus
Primer sequence ( Primer sequence: sequence underlined represents the M13 sequence tag. T a : annealing temperature for PCR. * Loci that can be consistently amplified from egg, miracida and metacercariae DNA. DNA extracted from egg, miracida and metacercariae life cycle stages. deviate from HWE, following Bonferroni's adjustment (P < 0.00005; Rice, 1989). All were significant for heterozygote deficiency (P < 0.00005). In addition four loci (Fh_1, Fh_8, Fh_13 and Fh_14) were found to have potential null alleles (as observed by plots of F IS vs number of alleles and F IS vs heterozygosity ( Supplementary Fig. 2). These loci together within an additional two loci (Fh_4 and Fh_11) were also identified as potential null alleles (calculated using Micro-Checker). All 15 loci were consistently amplified, produced unambiguous traces and were sufficiently polymorphic to provide a robust panel of microsatellites for accurately genotyping F. hepatica parasites.

Genotyping multiple F. hepatica life cycle stages
To fully exploit their potential the microsatellite markers were validated as a means of genotyping several life cycle stages of F. hepatica; namely eggs, miracidia and metacercariae. A subset of eight loci consistently and accurately genotyped pools of 50 metacercariae (Fh_1, Fh_2, Fh_3, Fh_4, Fh_7, Fh_12, Fh_13, Fh_14 and Fh_15; Table 1) and three loci in particular (Fh_1, Fh_2, Fh_3) could reliably genotype fewer than 50 metacercariae. Following this initial validation the panel was tested on duplicate samples of eggs, miracidia and metacercariae (50 of each) from four different isolates, whose genetic profile was known. Microsatellite analysis of these samples showed that the resulting genotype was consistent with each isolate profile. Multiple alleles for each locus were represented within each population of eggs indicating the markers were capable of defining populations of parasites. To improve microsatellite amplification from these life cycle stages that typically yield low levels of DNA, a multiplex protocol was used, which resulted in consistent amplification for the whole panel of 15 loci. Finally DNA extraction and subsequent amplification of microsatellites was optimised for a single miracidium, using Chelex DNA extraction with or without subsequent whole genome amplification. DNA was extracted from 10 individual replicate miracidia of known genotype and the expected profile was consistently amplified either directly or following whole genome amplification, suggesting that individual miracidia can be successfully genotyped using this approach.

Genomic location of F. hepatica microsatellites
The microsatellites were originally identified in our preliminary 83 Mb F. hepatica genome dataset and their presence and location was subsequently confirmed by mining our recently generated 1.3 Gb F. hepatica draft genome (Cwiklinski et al., in press). All loci were detected and found to reside on discrete scaffolds that ranged in size from 33 to 830 Mbp indicating that these loci are dispersed throughout the genome (Table 3). Linkage disequilibrium (LD) analysis carried out on the subset of nine loci where potential null alleles were not observed according to the more conservative Micro-Checker analysis (Fh_2,Fh_3,Fh_5,Fh_6,Fh_7,Fh_9,Fh_10,Fh_12,and Fh_15), showed significant results for several microsatellite pairs (Table 4).

Discussion
This study reports the characterisation of a panel of 15 novel polymorphic microsatellites, constituting the largest panel of microsatellite markers available for F. hepatica. These loci were identified from within a database of >2000 microsatellite loci, which was compiled from 83 Mb F. hepatica genome sequence data. Following the sequencing of the complete F. hepatica genome (Cwiklinski et al., in press), we now know that 83 Mb of sequence data represents only $6.5% of the 1.3 Gb draft F. hepatica genome. The identification of this large number of microsatellite loci from such a small proportion of genomic sequence shows the potential  Weir and Cockerham (1984) and HW: Fisher's exact test for Hardy-Weinberg equilibrium, P < 0.00005 after Bonferroni's adjustment; calculated using GENEPOP. In bold are the three loci that significantly deviate from HWE following Bonferroni's adjustment (P < 0.00005). * Not significant after Bonferroni's adjustment (0.0002). of next generation sequencing resources to facilitate population genetic studies in the future. The liver fluke samples chosen for the validation of these loci were sourced from 12 different animals, 10 of which were naturally infected with F. hepatica. All of the parasites obtained from each of the 10 naturally infected animals were used for this study, a total of 38 liver flukes, to allow an initial observation of the genetic profile of liver flukes found in an animal. Eight liver flukes from two experimentally infected animals were also tested, which was consistent with the number of adult liver flukes observed in a naturally infected cow.
Statistical analysis showed that these loci are polymorphic with a wide range of alleles, highlighting the suitability of these loci to further understand the population genetics of F. hepatica. Null alleles were potentially identified for at least four loci (Fh_1, Fh_8, Fh_13 and Fh_14), which exhibited notably higher levels of homozygosity (measured by F IS ) than the remaining 11 loci. Whereas parthenogenesis in Fasciola is known and could lead to higher inbreeding and F IS across the genome, the higher levels of F IS observed during this study at the subset of four loci suggests locus-specific effects, for which the presence of null alleles is a parsimonious explanation. A further two loci were potentially identified using the Micro-Checker algorithm (Fh_4 and Fh_11). Given the complex nature of the Fasciola life cycle the loci that have been identified as null alleles would need to be verified in a larger study and any future statistical analysis interpreted with caution, especially if the method of reproduction (asexual vs sexual) is not known for the specific isolate. It should be noted that analysis of the UK isolates used for this particular study have been shown to reproduce sexually and are diploid (Beesley et al., unpublished).
Several loci pairs were found to have significant LD values. As these values could be influenced by the population structure within this panel, specifically by those loci that were not in HWE, those loci showing potential null alleles were omitted from this analysis. The fact that the samples used for this analysis came from several animals and therefore from several potential populations of F. hepatica, could also explain the LD values. LD was not calculated per animal due to the small number of liver flukes obtained. Analysis of a larger population would give a more accurate representation of the LD. Analysis of the location of the 15 loci within the draft F. hepatica genome has shown that they are present on discrete scaffolds. Although the loci are thought not to be closely located, those loci with significant LD values could be present in areas of the genome that are co-inherited. Also as the genome is still in a draft format and with a wide range of scaffold sizes (33-830 Mbp), these loci may be more closely located than presently thought. Further annotation of the scaffolds where the loci are located is required to investigate the genes closely located to the microsatellite loci on the scaffolds and how this may affect their heritability. With the availability of single molecule sequencing, the assembly of the F. hepatica genome can also be improved, providing further information regarding the location of these loci.
An important aspect of this study involved the optimisation of the microsatellite panel for three different life cycle stages; eggs, miracidia and metacercariae. Typically these stages yield low amounts of DNA and are a significant challenge to investigate, yet are important as they represent the more readily accessible stages of the parasite on pasture, avoiding the need to passage cercariae through a mammalian host (Supplementary Fig. 1). Existing studies have concentrated on the adult stage of the F. hepatica life cycle but the ability of our microsatellite panel to robustly genotype multiple life cycle stages, potentially at the single miracidium level, offers significant promise for population genetic, molecular epidemiology and studies of the transmission dynamics of F. hepatica in future. Eight loci were shown to be robust, being consistently amplified from samples of 50 eggs/miracidia/metacercariae. In fact prior to multiplexing the markers, three of the loci could be amplified from fewer than 50 metacercariae (Fh_1, Fh_2 and Fh_3). It should be noted that although using pools of parasite material (specifically eggs and miracidia from populations of F. hepatica) allows the extraction of sufficient DNA for microsatellite analysis, this method can only be used to give an overall representation of the potential alleles present within those populations. Capillary sequencing of microsatellite PCRs can often result in stutter peaks, n-1 peaks or large allelic drop-out, which could result in a greater number of alleles for pooled samples from a given population, though the multiplex protocol was optimised to minimise these effects. Microsatellite analysis of individual miracidia or metacercariae resulting from single snail: miracidia infections, although more time consuming, would give a more accurate representation of the number of alleles present within a population. With this in mind we adopted the protocol of Valentim et al. (2009), that involves extracting DNA using Chelex followed by whole genome amplification, to successfully genotype DNA from individual F. hepatica miracidia using our panel of 15 markers. These results show the potential power of this method for F. hepatica populations, although further analysis is required using greater numbers of miracidia from different populations to accurately determine the error rate using this method for natural populations.
Levels of genetic variability within populations of the Fasciola genus from different hosts and geographical locations are currently not well understood, in particular, the impact of population dynamics on the development of anthelmintic resistance. To date the majority of studies investigating the genetic structure of Fasciola populations have focussed on available mitochondrial markers. Sequencing of the complete F. hepatica mitochondrial genome of two geographically distinct isolates (Australia and Salt Lake City) showed that these genomes demonstrate levels of intra-specific variation <1% of the 14 kb mitochondrial genome (Le et al., 2001). Mitochondrial markers have been applied to investigate TCBZ resistance in cattle in the Netherlands  and Teofanova and colleagues have shown that partial mitochondrial DNA sequences are also useful for population genetics analysis (Teofanova et al., 2011). However, it should be noted that mitochondrial markers have been shown to have potential limitations, including the fact that direct and indirect selection can influence mitochondria resulting in mitochondrial DNA not always being a strict neutral marker (Ballard and Whitlock, 2004;Galtier et al., 2009). Coupled with the fact that F. hepatica has a complex life cycle, including an asexual component all Fasciola spp. population genetics data should be interpreted with caution. Prior to this study only a small panel of microsatellites was available (Hurtrez-Boussès et al., 2004). Several of these loci were tested as part of this study, with inconsistent results (data not shown). This original panel was developed using F. hepatica parasites from the Bolivian Altiplano. These results could indicate that there is potential polymorphism at the microsatellite primer sites between geographically distinct Fasciola spp. isolates, and so the panel outlined within this study should be tested on samples from different geographical areas from large populations to give an accurate representation of the utility of this panel and the population genetic structure of F. hepatica isolates.
Studies have shown that microsatellites developed for one species can often be used in related species (Jarne and Lagoda, 1996;Gemmell et al., 1997;Dar et al., 2011). In particular one study showed that the F. hepatica loci identified by Hurtrez-Boussès et al. (2004) could be used for the related species Fasciola gigantica (Dar et al., 2011), indicating the potential of the microsatellite panel described within this study to be used for other Fasciola species. Nevertheless, with the expansion of next generation sequencing technologies, large genomic datasets are more readily available for mining, particularly for genetic markers, providing potential for a F. gigantica-specific microsatellite panel.
The availability of this polymorphic panel will allow extensive study of the population structure of F. hepatica. In particular these markers will provide insight into geographically distinct isolates, advancing our understanding of a parasite which has a complex life cycle and the ability to infect a number of mammalian hosts. The availability of this panel of microsatellite loci also allows the population structure of F. hepatica, found in areas of differential exposure to TCBZ, to be explored, which will be invaluable for the continued study of TCBZ resistance.