The impact of sequencing depth and relatedness of the reference genome in population genomic studies: A case study with two caddisfly species (Trichoptera, Rhyacophilidae, Himalopsyche)

Abstract Whole genome sequencing for generating SNP data is increasingly used in population genetic studies. However, obtaining genomes for massive numbers of samples is still not within the budgets of many researchers. It is thus imperative to select an appropriate reference genome and sequencing depth to ensure the accuracy of the results for a specific research question, while balancing cost and feasibility. To evaluate the effect of the choice of the reference genome and sequencing depth on downstream analyses, we used five confamilial reference genomes of variable relatedness and three levels of sequencing depth (3.5×, 7.5× and 12×) in a population genomic study on two caddisfly species: Himalopsyche digitata and H. tibetana. Using these 30 datasets (five reference genomes × three depths × two target species), we estimated population genetic indices (inbreeding coefficient, nucleotide diversity, pairwise F ST, and genome‐wide distribution of F ST) based on variants and population structure (PCA and admixture) based on genotype likelihood estimates. The results showed that both distantly related reference genomes and lower sequencing depth lead to degradation of resolution. In addition, choosing a more closely related reference genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high‐quality reference genome continue to decrease. However, to determine a cost‐efficient strategy for a specific population genomic study, a trade‐off between reference genome relatedness and sequencing depth can be considered.


| INTRODUC TI ON
As high-throughput sequencing (HTS) technologies and bioinformatic tools are rapidly becoming more accurate and increasingly affordable, it is possible to generate whole genome resequencing (WGR) data for almost any species. High-quality WGR data provide a remarkable amount of information, including a vast number of loci as well as a large number of genetic variants, thus enabling powerful population genomics analyses (Goodwin et al., 2016).
Today, whole genome resequencing with low read depth, which indicates a low average number of reads that are aligned to a base in the reference genome, is widely applied in population studies (Lou et al., 2021;Nielsen et al., 2011;Sims et al., 2014). When initiating a project on population genetics using WGR, there are two prerequisites: (i) the availability of a reference genome representing the focal species (Ellegren, 2014) and (ii) estimating the necessary sequencing depth to support accurate results (Meisner & Albrechtsen, 2018).
The selection of reference genome and sequencing depth are therefore two important features in a population genetic study. However, with a fixed budget, it is important to find a balance between data quality and sequencing costs by compromising on the reference genome or sequencing depth. For every study, the reference genome needs to be carefully chosen to avoid bias in mapping and variant calling, considering the amount of sequence identity between the reference genome and the data resulting from resequencing (Nielsen et al., 2011). Ideally, such studies should include a high-quality species-specific reference genome, which is often not available for nonmodel organisms. Given the costs and time associated with generating a de novo reference genome, it can be more realistic to use an existing one, yet more distantly related, as is done most often in population genomic studies (Duchen & Salamin, 2021). Currently, several empirical studies have examined the impact of nonconspecific reference genomes in population genomics. For example, Gopalakrishnan et al. (2017) compared the use of dog and wolf genomes as reference genomes for either domestic dog or wolf populations. Their results showed that the selection of the reference genome only had a minor influence on downstream analyses, probably because of the close relatedness of these two taxa, which diverged approx. 30,000 years ago (Skoglund et al., 2015;Wang et al., 2013). By contrast, other studies have found that the use of distantly related reference genomes biases the results of resequencing analyses in bacteria (Valiente-Mullor et al., 2021), fungi (Garcia-Rubio et al., 2018), and mammals (Yang et al., 2019), but no studies are yet available for insects.
In comparison to choosing the reference genome based upon relatedness, sequencing depth needs more preliminary knowledge to determine. The sequencing depth needs to be tailored to each particular study based on many aspects, for instance genome size of the target species, the availability of funding, and of course the research question. Using the strategy of low sequencing depth in a population genetic study may result in data loss, thus causing statistical uncertainty during genotype and variant calling (Crawford & Lazzaro, 2012;Meisner & Albrechtsen, 2018;Nielsen et al., 2012). This statistical uncertainty is mainly due to the limited information provided by the low amount of reads, leading to poor discrimination between sequencing error and real variation, that is, SNPs (Meisner & Albrechtsen, 2018). To improve the accuracy of estimates in a cost-limited population genetic study, especially using low sequencing depth, some researchers therefore demonstrated that employing a large sample size provides more accurate results, for instance, more than 50 individuals from a population with a sequencing depth of two (Buerkle & Gompert, 2013;Fumagalli, 2013;Han et al., 2014;Sims et al., 2014). However, unlike these studies that are based on simulated data, it is challenging to obtain a large number of samples for taxa such as the aquatic insect genus Himalopsyche. These species are apex predators in the benthic invertebrate community and naturally have small population sizes. Many other taxa are similarly rare. Thus, it is pragmatic to consider how we might improve population-based inference through the optimization of reference genome choice or sequencing depth for these sample-limited studies. Therefore, comprehensively exploring the influence of reference choice and genome may significantly remedy the defects caused by low depth. Therefore, we conclude that population genetic studies would benefit from closely related reference genomes, especially as the costs of obtaining a high-quality reference genome continue to decrease. However, to determine a cost-efficient strategy for a specific population genomic study, a trade-off between reference genome relatedness and sequencing depth can be considered.

K E Y W O R D S
aquatic insects, de novo genomes, population genomics, reference genomes, sequencing depth, whole genome resequencing

T A X O N O M Y C L A S S I F I C A T I O N
Entomology, Evolutionary ecology, Genetics, Genomics, Population genetics sequencing depth on downstream analyses is urgently needed, particularly for insects. In this study, we will evaluate the effects of the reference genome and sequencing depth on a genus of caddisflies, for which we already have significant molecular and taxonomic data (Deng et al., 2021;Heckenhauer et al., 2022).
Himalopsyche is a genus of caddisflies (Insecta: Trichoptera) that is primarily distributed in the mountainous areas of central and east Asia (Hjalmarsson et al., 2018). Their larvae live as freeroaming predators in cool fast-flowing rivers and streams and are regarded as bioindicators of water quality (Hjalmarsson, 2019;Morse et al., 2019;Tsuruishi et al., 2006). There are currently 56 named species of Himalopsyche (Hjalmarsson, 2019), which are divided into five species groups: the tibetana group, the lepcha group, the kuldschensis group, the phryganea group, and the navasi group . The species H. digitata and H. tibetana both belong to the tibetana group, which is distributed in the Himalayas. This area is characterized by a number of parallel north-south running river systems (such as the Karnali, the Narayani, and the Koshi) with sharp elevational gradients.
Currently, climate change is causing a number of cascading effects on river flow via rapidly receding glaciers greatly affecting aquatic biodiversity (Xu et al., 2009). It is therefore crucial to investigate the patterns of genetic diversity of aquatic insects in the region to understand the current and past ecological processes, in order to promote the conservation of freshwater biodiversity (Geist, 2011). Genome-wide analysis is an important conservation tool that can provide novel insights essential for identifying, for example, hotspots or reservoirs of genetic diversity, dispersal routes, ecological corridors, and stepping stone habitats (Barbosa et al., 2018;Brandies et al., 2019;Hohenlohe et al., 2021;Jasper et al., 2019). However, the quality and quantity of aquatic insect genomes are still relatively low compared with terrestrial insects (Hotaling et al., 2020(Hotaling et al., , 2021. Despite Trichoptera covering ~275 million years of evolution (Thomas et al., 2020) and comprising ~16,300 known species (Morse et al., 2019), only 29 Trichoptera genome assemblies (26 species) have been published to date (Heckenhauer et al., 2019Luo et al., 2018;Olsen et al., 2021;Ríos-Touma et al., 2022). This represents <0.15% of all known species, which limits progress in genomics-based research of this ecologically relevant group.
The number of studies using population genomics is rapidly increasing. With it, the need to test the effect of reference genome's selection and sequencing depth on the results. Indeed, such studies will allow to find a balance between data quality and sequencing costs by compromising on the reference genome or sequencing depth. Therefore, we conducted a case study using an empirical dataset to evaluate how reference genome selection, that is, degree of relatedness, and sequencing depth affect downstream population genetic analyses of the species H. digitata and H. tibetana. In addition, we tried to reveal the correlation between the inferences from population genetics and drainage network and provided insights for local biodiversity conservation of these enigmatic species.

| Study design
As illustrated in Figure 1, et al., 2022). Since the specimens of Himalopsyche sp. was collected as a larva that cannot be identified to species level by morphologic diagnosis, and the CO1 sequence of this specimen differed slightly from all hitherto known sequences of named species, we can only classify this specimen as a species belonging to the kuldschensis group. These five genomes represent a gradient of genetic relatedness with respect to our target species, which were used as reference genomes. We used populations of two Himalopsyche species (H. digitata and H. tibetana), each species including four populations with each population containing six individuals except for one population from H. digitata and H. tibetana, respectively, which contained only five individuals. The reads of the populations were subsampled into three separate datasets with an average depth of 12.5×, 7.5× and 3.5×, respectively. Reads from each dataset were mapped to the five different reference genomes separately. Afterward, variants were called from all datasets using two strategies: Genotype calling with GATK and direct estimation of the genotype likelihood using ANGSD. Variants identified with the first strategy were used to calculate the population genetic indices including inbreeding coefficient (F), nucleotide diversity (π), and pairwise fixation index (F ST ); variants estimated from the second strategy were used in the principal component analysis (PCA) and admixture analysis. Finally, we compared population genetic indices and population structure with different references and sequencing depths.

| De novo genomes of three reference species
We used the genome assemblies of four species of Himalopsyche (H. tibetana, H. sp., H. phryganea, and H. japonica) and of one species from the closely related genus Rhyacophila (R. brunneae, both family Rhyacophilidae) as reference genomes. The Himalopsyche species represent the four major taxonomic groups in the genus according to Hjalmarsson et al. (2018): the tibetana group (H. tibetana), the kuldschensis group (H. sp.), the phryganea group (H. phryganea), and the navasi group (H. japonica). Unfortunately, we could not obtain a sample of H. lepcha (the only species in the lepcha group). Rhyacophila is the most closely related genus to Himalopsyche (Thomas et al., 2020).
The genomes of H. phryganea (JAGVSL000000000) and Rhyacophila brunnea (previous version of JAGYXB000000000, available at https://doi.org/10.6084/m9.figsh are.c.60330 11.v1) were previously sequenced and assembled . We generated new de novo assemblies of H. tibetana (collected from the Ê Ghunsa River, Nepal), H. sp. (kuldschensis group, collected from the Ê Ghunsa River, Nepal), and H. japonica (collected from the Nogami River, Kisomachi, Nagano Prefecture, Japan). Tissue of abdomen and thorax segments were used for DNA extraction after removal of the intestinal tract. We extracted high molecular weight genomic DNA using a salting-out protocol adapted from Miller et al. (1988), as described in Heckenhauer et al. (2019). We quantified the DNA using a Qubit 4.0 fluorometer with the dsDNA Broad Range Kit (ThermoFisher Scientific) and checked its purity with a DS11 spectrophotometer (DeNovix). We used a low-cost sequencing strategy that has been shown to produce contiguous genome assemblies, that is, employing a combination of short (Illumina) and long-read (Oxford Nanopore) technologies to sequence the three new reference genomes, as described in Appendix S1 (Section 1.1).
We conducted a long-read assembly of the Oxford Nanopore Technology sequencing reads with wtdbg2 v2.4 (Ruan & Li, 2020), followed by mapping and polishing with long reads with Minimap2 v14 (Li, 2018) and Racon v1.3.1 (Vaser et al., 2017). We then performed another round of long-read polishing with nanopolish 0.11.1 (Loman et al., 2015) by first indexing the signal-level data in the FAST5 files using nanopolish index, realigning the long reads to the Racon-polished assembly with minimap2, and then sorting and indexing the bam file with samtools. We used nanopolish_makerange. py to split our draft genome assembly into 50-kb segments and generated a consensus for each segment in parallel with nanopolish variants (−-consensus --min-candidate-frequency 0.1). We generated the polished genome in FASTA format using nanopolish vcf-2fasta. Because noisy long reads can suffer from indel errors, even with polishing, we further polished the assembly with high-quality short-read data with Pilon v1.22 (Walker et al., 2014). To do this, we mapped quality trimmed Illumina reads to the "nanopolished" assembly with bwa-mem and sorted the read alignments by leftmost coordinates using the sort options in SAMtools v1.9 (Li et al., 2009).
Finally, we used Pilon v1.22 (option --fix indels) to polish the assembly. Following polishing, we used purge_dups 1.2.3 (Roach et al., 2018) to purge haplotigs and overlaps in the assembly based on read depth.
For H. japonica, this pipeline did not meet the expected quality regarding contiguity and BUSCO completeness. Thus, we conducted a de novo hybrid assembly with the raw Illumina data together with the long reads using MaSuRCA v.3.1.1 (Zimin et al., 2013(Zimin et al., , 2017. In the config file, we specified the insert size and a standard deviation (10% of insert size) for the Illumina reads, as well as jellyfish hash size (estimated_genome_size*~long-read coverage (equal to depth in this scenario)). All other parameters were left as defaults. We used purge_dups 1.2.3 to purge haplotigs and overlaps in the assembly based on read depth.
A summary of the assembly statistics and BUSCO completeness is given in Table 1. The final genome assemblies were screened for potential contaminations with taxon-annotated GC-coverage plots (TAGC plots) using BlobTools v1.0 (Laetsch & Blaxter, 2017). For this F I G U R E 1 Workflow of data processing in this study showing the treatment of short-read resequencing data from the two target species and subsequent mapping and variant calling with different reference genomes for assessing genetic diversity and population genetic structure (see details in Section 2.1). purpose, all preprocessed Illumina reads of the respective species were mapped against the final genome assemblies using BWA-MEM v0.7.17-r1188 (Li, 2013) and taxonomic assignment for BlobTools was done with blastn using -task megablast and -e-value 1 e-25.
Details are given in Appendix S1 (Section 1.2).
For functional annotation of predicted genes, we first split each amino acid FASTA file into multiple files with 50 sequences each using awk and then used blastp to search against the ncbi-blast2.9.0+ nr database with an e-value cutoff of 10-4, −max_target_seqs set to 10 and -out format 6. The resulting xml files were merged using cat and then functionally annotated using the command-line version of Blast2GO (Götz et al., 2008).

| Species tree reconstruction
To determine the phylogenetic relatedness among the five species, we estimated a species tree using single-copy orthologs resulting from the BUSCO analyses of the five genome assemblies with an additional species, Glossosoma conforme, as an outgroup . For each single-copy ortholog, we generated an unaligned FASTA file with sequences from each species. We then aligned each ortholog with the MAFFT L-INS-i algorithm (Katoh & Standley, 2013). We selected the best-fit substitution model for each alignment using ModelFinder (option -m mfp, (Kalyaanamoorthy et al., 2017)) in IQtree v.2.0.6 (Minh et al., 2020) and estimated a maximum-likelihood tree with 1000 ultrafast bootstrap replicates (Hoang et al., 2018) with the BNNI correction (options -bb 1000 -bnni). We then generated a multispecies coalescent species tree in ASTRAL-III (Zhang et al., 2018) using the best maximum-likelihood TA B L E 1 Assembly statistics of reference genomes used in this study.
We used HALtools (Hickey et al., 2013) to obtain global alignment information with the halStats option. We then used "halSum- 2.6.2 | DNA extraction, library preparation, and sequencing Genomic DNA was extracted following a modified salting-out protocol adapted from Miller et al. (1988), as described in Heckenhauer et al. (2019). In case of low purity DNA (A260/A230 purity ratio below 1.4 and DNA concentration higher than 100 μg/μl that was measured by DeNovix DS11 spectrophotometer), the template DNA was subject to an additional cleanup using magnetic beads as described in Appendix S1 (Section 2.1). Afterward, all the samples were sent to Novogene Co., Ltd. for DNA library preparation and sequencing. 150-bp paired-end reads were generated on an Illumina HiSeq 2000 platform.
2.6.3 | Quality control, data processing of population sequence data, and variant calling We assessed the quality of Illumina reads before and after each step using FastQC v0.11.8 (https://www.bioin forma tics.babra ham. ac.uk/proje cts/fastq c/) and MultiQC v1.7 (Ewels et al., 2016), details are described in Appendix S1 (Section 2.2). We trimmed overrepresented k-mers using autotrim.pl v0.6.1 (Waldvogel et al., 2018) with Trimmomatic v0.38 (Bolger et al., 2014), and we removed adapters with Cutadapt v2.23 (Martin, 2011). After quality filtering, the average sequence depth was 18× and the minimal depth was 12.5× for all the samples. To maintain even depth of all the individuals and to simulate subsets with varying depth levels, we chose 12.5×, 7.5× and 3.5× as our test depth levels. We randomly subsampled the sequencing reads to a specified depth using rasusa 0.3.0. (Hall, 2022) with a random seed of 1 (−s 1), the target depth (−-coverage 12.5, −-coverage 7.5, −-coverage 3.5) and the estimated genome size For population genomic analyses, FASTA files of the reference genomes were indexed with the function bowtie2-build of bowtie2 2.3.5 (Langmead & Salzberg, 2012). We then mapped the reads of the three different depth datasets (12.5×, 7.5× and 3.5×) of the H.
digitata and H. tibetana populations to each of the five different reference genomes separately using Bowtie2, which resulted in 30 datasets (2 species × 3 depth × 5 reference genome, Figure 1).
After marking duplicate reads for each dataset using Picard v2.20.8 (Picard Tools -By Broad Institute), we conducted variant calling for downstream population genetic analysis using two strategies.

| Genetic diversity and population structure analysis
The variants called by GATK with genotype calling were used to estimate π, individual F, and pairwise F ST (Weir & Cockerham, 1984).
These are commonly used measurements for population genetics.
F is used to directly quantify the alleles inherited from common ancestors in an individual's lineage, π is an index for population-level genetic diversity quantification, and F ST provides a primary descrip- We used the genotype likelihoods estimated by ANGSD to generate a PCA with PCAngsd (Meisner & Albrechtsen, 2018) and individual admixture proportions estimating using NgsAdmix (Skotte et al., 2013) after Linkage pruning with ngsLD (Fox et al., 2019).

| New genomic resources
We combined long-and short-read sequencing technologies to generate three new de novo genome assemblies for the genus Illumina and ∼18-26× Oxford Nanopore sequencing depth. All three assemblies are of high quality with respect to the number of contigs and contig N50 ( Table 1). We identified >96% of the Endopterygota BUSCO gene set in the assemblies. BlobTools detected no contamination (Appendix S1: Figures 1-3). Remapping the Illumina reads back to the assemblies revealed more than 98% could be unambiguously placed (Appendix S1: Figures 1-3).

| Phylogenetic relationships of the reference genomes-resolving the Himalopsyche backbone
We built a species tree for the five reference genomes to estimate their evolutionary history and to verify the phylogenetic relationship between the reference genomes and the two population-level target species, H. digitata and H. tibetana. Phylogenetic relationships were strongly supported ( Figure 2 tibetana to the others ( Figure 3). Consequently, the massive loss of information resulting from choosing a distant reference genome is likely to lead to poor performance in the downstream analyses.
We observed that the choice of reference genome influenced population genetic values such as pairwise F ST (Appendix S3) and nucleotide diversity estimates (Appendix S1: Figure 14), the number of outliers in the inbreeding coefficient estimates (Appendix S1: Figure 13), as well as the resolution of the final outcome in genome-wide F ST estimates (Figure 4), PCA, and admixture ( Figure 5). to fluctuate for both species, which might result in an ineffective or misleading conclusion (Appendix S1: Figure 13). The results of the PCA, admixture analysis, and the genome-wide F ST clearly showed notable decreases in the resolution of the plots when selecting a more distantly related reference genome ( Figure 5). For instance, for populations of H. digitata, regardless of the depth, the cluster of pop 1 was well defined when mapped to H. tibetana in the PCA plot and admixture analysis, but no distinguishable structure was discernible among all four populations when mapped to R. brunnea (Figure 5a).
In addition, using a conspecific reference genome significantly im- with the abnormal value (highly qualified resolution, Figure 5b).
We also observed extensive influence of sequencing depth on the downstream analyses. In addition to decreasing number of variants, low sequencing depth also affected estimates of nucleotide diversity, the inbreeding coefficient, pairwise and global F ST , and population structure. In most cases, low sequencing depth reduced the accuracy of these inferences; however, the influence proved to be negligible or limited in some treatments. For example, the population structure of H. tibetana was highly differentiated regardless of depth when mapped to H. tibetana and was less differentiated when sequencing depth decreased while mapped to H. phryganea, but still sufficient to obtain a reliable result ( Figure 5b). Likewise, the F value of all individuals changed when the sequencing depth decreased from 12.5× to 3.5×, but the differentiation among populations remained comparable (Appendix S1: Figure 13).  Figure 5). Considering the inherent genetic variation of the two species, it is not surprising that more distantly related reference genome or lower sequencing depth were more tolerable for the populations with higher genetic diversity.

| Reference genomes
In terms of BUSCO completeness and contiguity, the three de novo genomes provided in this study as references are of comparable quality than the other Trichoptera genomes published previously (Heckenhauer et al., 2019Luo et al., 2018;Olsen et al., 2021;Ríos-Touma et al., 2022). They contained 96%-97% of an Endopterygota core gene collection indicating an almost complete coverage of known single-copy orthologs in the assembly. The backmapping rate of Illumina reads to the assemblies ranged between 96 and 98%, which also indicates the high quality of the assemblies.
Previous genomic studies have focused on sequencing a wide range of different families of Trichoptera and investigating variations across the order . These three new genome assemblies provide important new genomic resources for the scientific community, especially since Trichoptera and other aquatic insects are in general underrepresented in genomic research (Hotaling et al., 2020(Hotaling et al., , 2021. Together with previously published ones (H. phryganea and R. brunnea), these newly available genomes adequately provide an initial perspective about the phylogenetic relationship of the four main taxonomic groups of Himalopsyche, as well as implying the genetic relatedness between the target species and the five different reference genomes, respectively, for this study.

| Impacts of reference genome and sequencing depth on population genetic inferences
High-throughput sequencing of individual specimens is poised to be-   Figure 3). Moreover, increasing mismatches may also occur due to alternative alleles, thus increasing the so-called "reference bias" (Günther & Nettelblad, 2019). Consequently, reference bias can impact variant calling by missing alternative alleles or by incorrectly calling heterozygous sites and therefore lead to an underestimation of variants, including rare/private variants (Günther & Nettelblad, 2019;Taub et al., 2010). Considering these two aspects, the effects related to the choice of reference genome may propagate to a certain degree to subsequent downstream analyses in a population genetic study, for example when investigating heterozygosity and genetic diversity, gene flow, as well as ancestry proportions (Brandt et al., 2015;Günther & Nettelblad, 2019

F I G U R E 4 Genome-wide distribution of F ST values (weighted) of (a) the H. digitata populations and (b) H. tibetana populations. F ST values
were calculated in 50-kb windows across contigs obtained from each reference genome; thus, the X-axis (position) is not comparable among datasets. The horizontal gray lines indicate a threshold that is used for selecting the abnormal F ST windows for each species. The threshold is the minimum value of the top 1% F ST windows on the whole genome when using H. tibetana as reference genome and 12.5× as sequencing depth (0.16 for H. digitata and 0.28 for H. tibetana). The percentage of abnormal F ST windows is the number of F ST windows above the threshold over the total number of F ST windows. The global F ST is the genome-wide weighted Fst value estimated based on Weir and Cockerham (1984).
(3.5×). In addition, estimates of population structure, including PCA and admixture, are strongly affected, resulting in unreliable, weakly supported findings.
Notably, even though population genetic analyses of both H.
tibetana and H. digitata show the greatest accuracy when using H.
tibetana as the reference genome, we have observed a remarkable improvement in the results when using a conspecific reference genome, for instance, in the high resolution of the population structure, especially with the low sequencing depth (Figure 5b) (Figure 3).
Although H. digitata and H. tibetana are recovered in the same clade in previous phylogenetic analyses , the poor mapping success suggests that genetic distance may still be high.
Unfortunately, there are no established divergence times within the genus of himalopsyche at present. The most recent study (Thomas et al., 2020) shows that the divergence between Rhyacophila and Himalopsyche was approx. 90 Ma. Considering the distinct ecological niches of H. tibetana (inhabits high altitude) and H. digitata (inhabits a lower altitude) in the same distribution range (both endemic to the Himalayas), we hypothesize that the divergence between these two species may be associated with the uplift of the Qinghai-Tibet Plateau, which began ca. 45 Ma ago (Ding et al., 2022). In

(b)
caddisflies species is not entirely unexpected. Therefore, we suggest that when selecting a closely related species as a reference genome in a population genomics study, it is important to consider genetic relatedness.
We also show that sequencing depth must be considered when designing population genomic analyses. Unlike de novo genome assembly which demands high sequencing depth, highly accurate results can be achieved with lower sequencing depth in population genomics by combining information from a large number of individuals either during SNP calling or other processes (Buerkle & Gompert, 2013;Fumagalli, 2013;Han et al., 2014). Even though the accuracy of population genetic inferences can be improved by increasing the number of samples, the bias caused by low sequencing depth cannot be ignored, especially since it can often be difficult to obtain many samples for some populations of rare animals. As revealed by previous studies, low sequencing depth may cause erroneous SNP calls, due to the errors introduced and amplified during PCR during library prep (Sims et al., 2014). Moreover, it may also produce ambiguous reads during mapping to the reference genome (Taub et al., 2010). Such biases may lead to incorrect conclusions in population genetics inferences, including population genetic differentiation, population structure, and demography (Crawford & Lazzaro, 2012;Fumagalli, 2013;Han et al., 2014;Jiang et al., 2019;Korneliussen et al., 2013) (Buerkle & Gompert, 2013;Fumagalli, 2013), we reveal that the roles of reference genome and sequencing depth in a study of population genetics could also be considered as a trade-off. We suggest that population genetic study using genomic data may benefit from applying a more closely related reference genome. Undoubtedly, the optimal option would be a conspecific reference genome even with a low sequencing depth.
Our results showed that a conspecific reference genome can significantly improve the accuracy and reliability of all kinds of analyses, in particular WGR-based SNP imputation which will be promising in other genomic investigations like genome-wide association studies.
However, if resource limitations exist (in terms of funding, available biological material, time (i.e., wet-and dry lab efforts)), or the research is focused on populations with high interpopulation variation, a trade-off between a more distant reference genome and a higher sequencing depth can be considered. In other words, in a population genetic study, the trade-off between reference genome and sequencing depth is dictated by the focus of research. Although our case study is carried out on caddisflies and thus may not be universally applicable to other organisms, we believe that our results do provide a valuable example that enhances developing roadmaps involved in the choice of appropriate reference genome and sequencing depth in population genomic studies.

ACK N OWLED G M ENTS
We are thankful to Raphael Coimbra providing for valuable advice in population genetic analyses. We thank Koji Tojo and Robert Wisseman for providing specimens and numbers of Nepalese students for supporting our collection efforts in the field. We

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All the COI and CAD data for this research are available in the GenBank