Identification of polymorphic SVA retrotransposons using a mobile element scanning method for SVA (ME-Scan-SVA)

Mobile element insertions are a major source of human genomic variation. SVA (SINE-R/VNTR/Alu) is the youngest retrotransposon family in the human genome and a number of diseases are known to be caused by SVA insertions. However, inter-individual genomic variations generated by SVA insertions and their impacts have not been studied extensively due to the difficulty in identifying polymorphic SVA insertions. To systematically identify SVA insertions at the population level and assess their genomic impact, we developed a mobile element scanning (ME-Scan) protocol we called ME-Scan-SVA. Using a nested SVA-specific PCR enrichment method, ME-Scan-SVA selectively amplify the 5′ end of SVA elements and their flanking genomic regions. To demonstrate the utility of the protocol, we constructed and sequenced a ME-Scan-SVA library of 21 individuals and analyzed the data using a new analysis pipeline designed for the protocol. Overall, the method achieved high SVA-specificity and over >90 % of the sequenced reads are from SVA insertions. The method also had high sensitivity (>90 %) for fixed SVA insertions that contain the SVA-specific primer-binding sites in the reference genome. Using candidate locus selection criteria that are expected to have a 90 % sensitivity, we identified 151 and 29 novel polymorphic SVA candidates under relaxed and stringent cutoffs, respectively (average 12 and 2 per individual). For six polymorphic SVAs that we were able to validate by PCR, the average individual genotype accuracy is 92 %, demonstrating a high accuracy of the computational genotype calling pipeline. The new approach allows identifying novel SVA insertions using high-throughput sequencing. It is cost-effective and can be applied in large-scale population study. It also can be applied for detecting potential active SVA elements, and somatic SVA retrotransposition events in different tissues or developmental stages.


Background
Mobile elements are discrete DNA fragments that can move and integrate into other locations in a genome. More than two-thirds of human genome are occupied by repetitive or repeat-derived sequences, including active mobile elements that are still capable of transposition [1]. Mobile elements can insert and disrupt host genes or participate in genomic rearrangement, resulting in diseases (for review, see [2][3][4]). In humans, some mobile element insertions (MEIs) are polymorphic across individuals [4,5]. Besides their functional and structural genomic impact, these polymorphic MEIs (pMEIs) are also important markers for ascertaining human population relationships and evolutionary history [6][7][8]. Therefore, it is of great interest to identify pMEIs in human populations. In general, there are two high-throughput sequencing based strategies for identifying pMEIs; whole genome and MEI-targeted sequencing. Compared with whole genome sequencing, MEI-targeted high-throughput sequencing methods are more cost-effective [9]. Although a number of targeted high-throughput sequencing methods have been developed for Alu and L1 elements [10][11][12][13][14], to date the only targeted sequencing method for SVA (SINE-R/VNTR/Alu) elements is retrotransposon capture sequencing (RC-seq) [10,[15][16][17].
SVA is a composite element consisting of a (CCCTCT) n hexamer simple repeat region at the 5′ end, an Alu-like region, a variable number of tandem repeats (VNTR) region, a short interspersed element of retroviral origin (SINE-R) region, and a poly-A tail after the putative polyadenylation signal (Fig. 1a). SVA insertions have all the hallmarks of L1-mediated target primed reverse transcription, such as poly(A) tail, target-site duplications (TSDs), 5′ truncation, and have been shown to mobilize by hijacking the L1-encoded protein machinery [18][19][20][21]. SVA elements represent the youngest retrotransposon family in the human genome and many insertions are polymorphic among human populations [5,18,22]. The polymorphism rates of members of the youngest subfamilies SVA_E and SVA_F were estimated as 37.5 and 27.6 %, respectively [18].
Although SVA elements only constitute approximately 0.1 % of the human genome, they have substantial biological impact in human. Insertion of SVA elements can trigger exonization, polyadenylation, enhancer and alternative promoter events, which lead to the formation of various transcript isoforms and evolutionary dynamics that contributes to the differences in gene expression level [19,[23][24][25][26][27][28]. Several human diseases have been attributed to SVA insertions or SVA-associated deletions, including Fukuyama congenital muscular dystrophy,    Fig. 1 Experimental protocol design. a Scheme of SVA element structure. b Sequence alignment of SVA_1 and SVA_2 primer binding sites and SVA, Alu subfamily consensuses. The SVA_1 and SVA_2 primer sequences are shown above of the alignment and the amplification directions are indicated by arrows. Top row of the sequence alignment shows the sequences of the primer binding sites of SVA_1 and SVA_2. SVA_1 binding site includes the SVA characteristic deletion as compared to Alu sequences. Dots in the alignment represent the same nucleotides as the primer binding site sequences. Deletions are shown as dashes and mutations are shown as the correct base for the consensus. c SVA-specific amplifications during ME-Scan-SVA library construction and the final DNA fragment structure. The DNA library after second-round amplification is size-selected at~500 bp (an example electropherogram image is shown). White box: adaptor; grey box: index; dark green box: flanking genomic region; yellow box: TSD; orange box: (CCCTCT) n hexamer simple repeat; light green box: SVA Alu-like region Lynch syndrome, X-linked agammaglobulinemia, autosomal recessive hypercholesterolemia, hemophilia B, and neurofibromatosis type 1 [29][30][31][32][33]. Therefore, it is important to systematically analyze polymorphic SVA insertions in human populations. Mobile element scanning (ME-Scan) is a targeted high-throughput sequencing strategy for MEIs. In previous studies, the technique was applied for identifying AluYb8/9 insertion polymorphisms in human genomes [11,14], and Ves SINE insertions in bat genomes [34]. In this study, we developed a ME-Scan method and an associated data analysis pipeline for SVA elements, which we termed ME-Scan-SVA. We then demonstrated the method by examining SVA insertions in 21 individuals.

ME-Scan-SVA overview Experimental protocol design
We designed a two-round nested PCR amplification protocol for SVA following the existing ME-Scan method [35]. We targeted the 5′ Alu-like region of the SVA elements to selectively enrich for SVA elements. Despite the high similarity between the SVA Alu-like region and Alu subfamily consensus sequences, one insertion and one deletion are shared by all SVA sequences (Fig. 1b). Therefore we designed SVA-specific primers in these regions. A biotinylated primer (SVA_1) was used for the first round PCR reaction and the second-round nested primer (SVA_2) was used to further improve specificity and add Illumina sequencing adaptors. Because typical SVA truncations happen at the 5′ of the insertion, this nested-PCR design at the 5′ end of the SVA element allows us to selectively enrich full-length SVA elements. In addition, 5′ or 3′ truncated SVA elements that contains both SVA_1 and SVA_2 primer binding sites (Fig. 1b, SVA consensus position 78 -137) will also be amplified. Based on the human reference genome (hg19), we estimate that this method can amplify 65 % of SVA_D (828/1274), 27 % of SVA_E (52/192), and 24 % of SVA_F elements (198/821), respectively.
A DNA fragment in the final sequencing library contains a variable-length 5′ flanking genomic sequence, the 5′ terminus of an SVA element ends at the primer binding site of SVA_2, and 132 base pair (bp) of sequencing adapters that flank either end of the fragment (Fig. 1c  bottom). The expected SVA fragment size is the size of the (CCCTCT) n hexamer simple repeats plus 40 bp in the Alu-like region. Because of the variable size of the simple repeat and possible the 5′ truncation, the size of an SVA fragment could vary between 20 bp (SVA_2 primer binding site only) to several hundred bps. We aim to minimize the library size for sequencing efficiency while maintaining sufficient flanking sequence for identifying the genomic location of the SVA insertions. Therefore, we first fragment the genomic DNA to about 1,000 bp in size. After library construction, we select DNA fragments around 500 bp for sequencing (~130 bp adaptor sequence +~370 bp SVA sequences and genomic flanking sequence).

Computational analysis pipeline
We designed a pipeline for ME-Scan-SVA analysis based on the general ME-Scan workflow [35]. Figure 2 shows an outline for the analysis pipeline. Using the Illumina 100 bp pair-end sequencing format, two sequencing reads are generated from each DNA fragment (Fig. 2a). We use the 40 bp Alu-like region in the first read (referred as the SVA Read in the following text) to determine if a read-pair is derived from an SVA locus (Fig. 2a). For each SVA Read, the Alu-like region is compared with the SVA consensus sequence [36] using BLAST [37] and the resulted bit-scores are recorded. The BLAST bit-score is a normalized measurement of the similarity between the SVA Read and the corresponding SVA consensus sequence. To choose a suitable cutoff for the BLAST bit-score, we determined the BLAST score distribution of SVA sequences in the human reference genome (Fig. 3). As expected, almost all SVAs from SVA_F, the youngest SVA subfamily, are present in the highest BLAST bit-score bins (>65). The majority of SVAs in the subfamilies SVA_D, SVA_E, and SVA_F have BLAST bit-scores higher than 48. Because these three subfamilies contain all known polymorphic SVA insertions, we selected BLAST bit-score 48 as a relaxed cutoff and 65 as a stringent cutoff. The relaxed cutoff is expected to capture more candidate loci. The stringent cutoff will enrich for the youngest subfamily SVA_F, which is expected to contain higher proportion of very recent insertions (Fig. 3). We then filter SVA Read based on selected bit-score cutoffs. A typical 100 bp SVA Read contains 40 bp SVA Alu-like region, and the variable (CCCTCT) n hexamer simple repeats region. Because the simple repeat region are often longer than 50 bp in size, most of the SVA Reads are expected to contain little or no flanking genomic sequences. Therefore, we use the second read in the read pair (referred as the Flanking Read in the following text) to identify the genomic location of an SVA insertion. Flanking Read sequences are aligned to the reference genome using the program BWA-MEM (Burrows-Wheeler Alignment Tool-maximal exact matches) [38]. The mapped Flanking Reads are then filtered based on their mapping quality scores to ensure the high-confidence mapping of the read. After mapping, the end positions of the mapped Flanking Reads are sorted, and then clustered within a sliding window of 500 bp in size. Within each cluster, the Flanking Read mapping position that is closest to the SVA insertion site is chosen as the insertion position for that locus (Fig. 2b). Depending on the length of the SVA element in the DNA fragment, the Flanking Reads might not cover the exact SVA insertion site. The candidate SVA insertion loci are then separated into several types (Fig. 2c). Reference SVAs are loci that are annotated by RepeatMasker in the human reference genome and passed the BLAST score cutoff. Fixed SVAs are reference SVA loci that are not known to be polymorphic. Known polymorphic SVAs are loci reported in previous studies [5,22,39,40]. Finally, novel polymorphic SVA insertions are loci that do not overlap reference and known polymorphic SVAs.
Applying ME-Scan-SVA to 21 human samples Data generation To demonstrate the feasibility of our protocol, we constructed a ME-Scan-SVA library using 21 individuals from two HapMap populations, including six parentoffspring trios ( Table 1). All samples were pooled after indexing and the pooled library was used to construct a ME-Scan-SVA sequencing library. The library was sequenced using the Illumina Hiseq 2000 with 100 bp paired-end format. We obtained 152.9 million total read pairs from the library, and the average and median of individual read number is 7.3 and 6.3 million, respectively (Additional file 1: Table S1).

Read filtering and candidate loci identification
As described in the "Computational pipeline" section, we filtered SVA Read based on BLAST bit-score cutoffs.
We used BLAST bit-score 48 as a relaxed cutoff and 65 as a stringent cutoff. Using the relaxed and stringent cutoffs, 93.8 and 17.6 % of the SVA Read passed the cutoff, respectively (Additional file 1: Table S1).
The vast majority (99.2 %) of Flanking Reads was mapped to the reference genome. More than 82 % of the reads in each individual passed a BWA-MEM mapping quality score cutoff of 29. We used this mapping quality cutoff to exclude low-quality reads and reads that mapped to multiple genomic locations. Overall, 78.1 and 14.5 % of the read-pairs passed both SVA Read and   Flanking Read filtering under the relaxed and stringent SVA Read cutoffs, respectively (Additional file 1: Table S1).
To obtain candidate SVA insertion loci, the mapping positions of mapped Flanking Reads were sorted and then clustered within a sliding window of 500 bp in size (Fig. 2b). A total of 28,130 and 7,972 insertion positions were generated from the 21 individuals under relaxed and stringent SVA Read cutoffs, respectively.

Sensitivity analysis
To estimate the sensitivity of ME-Scan-SVA, we first identified presumed fixed SVA insertion loci in the human reference genome. The presumed fixed SVA insertion loci are defined as SVA insertions that are present in the reference genome and are known to be not polymorphic in previous studies [5,22,39,40]. Using the relaxed and stringent SVA Read cutoffs, we identified 1,343 and 200 loci as presumed fixed SVAs, respectively. Using this set of SVA insertion loci, we calculated the depth of coverage and the number of unique reads (URs) for each locus. To account for inter-library variation, we normalized the depth of coverage at each locus by the total number of mapped reads in each individual as TPM (tags per million).
Using the TPM and UR info for each locus, we calculated the sensitivity for identifying fixed loci under different TPM and UR cutoffs (Fig. 4). Overall, we achieve high sensitivity: even at a stringent TPM/UR cutoff 15/15, the pooled data has 89 and 96 % sensitivity, for the relaxed and stringent conditions, respectively (Fig. 4). Among individuals, the sensitivities are similar but lower than pooled data at high cutoffs (Additional file 2: Figure S1).

SVA candidate loci identification and validation
To identify SVA insertion candidates, we started from the list of candidate insertion positions and used TPM/ UR cutoffs that achieve 90 % sensitivity in each individual based on the presumably fixed SVA insertions (Table 1). In each individual,~1,400/~300 SVA insertion loci were selected under the relaxed/stringent conditions. Among them,~200/~100 loci are polymorphic, and~10/2 loci are novel (Table 1). In total, 428 polymorphic SVAs were identified among the 21 individuals under relaxed condition, and 151 of them are novel. As expected, the vast majority of novel insertions are rare, and~80 % of the loci are only present in one sample. In comparison, some of the known polymorphic loci are more common and are present in all individuals in our dataset (Fig. 5a). Candidate loci from the stringent cutoff exhibit similar allele frequency pattern (Fig. 5b). The final relaxed and stringent call sets are available in Additional files 3 and 4.
To validate polymorphic SVA insertions, we performed PCR validation on 11 candidates (Additional files 5 and 6: Figure S2, Table S2). We used a combination of internal and external PCR for validation, similar to the protocol in the 1000 Genomes Project [22]. Out of the 11 loci, six showed clear and distinct bands for SVA insertions. We did not achieve specific amplification for SVA internal products for the remaining loci despite multiple attempts with different PCR conditions (see Method section for detail). This result might partially due to the difficulty in amplifying the complex SVA 5′ region. Although we expect some of these loci are true positives, our current validation results give a minimum true positive rate of 55 % (6/11).
For the six confirmed loci, we then performed individual genotyping to assess the individual genotype calling accuracy (Additional file 5: Figure S2). We consider an individual's genotype call from our computational pipeline correct if: 1) our pipeline called an SVA insertion and the PCR genotyping validated the insertion (either homozygous or heterozygous); or 2) our pipeline did not call an insertion and the genotyping result is no insertion. In general the individual genotypes are in agreement with computational calls: we achieved 93 % accuracy for individual genotype calls under the relaxed condition for the six loci (Additional file 7: Table S3). For the five loci that are also called under the stringent condition, one locus (Loc 5) has an accuracy of 17 %, primarily due to the under-calling of individuals with the SVA insertion (i.e., false-negative). The remaining four loci have an average accuracy of 96 % (Additional file 6: Table S2).
Next we compared our results with the 1000 Genomes Project phase 3 dataset [22], where 12 samples in our dataset are included. For these 12 overlapping samples, we called 363 SVA insertions and the 1000 Genomes Project called 223 insertions. Based on the primerbinding site position (78-137 in the SVA consensus sequence), 67 SVA insertions in the 1000 Genomes dataset are expected to be amplified by ME-Scan-SVA. Among these 67, 39 loci (58.2 %) were called in our data set. The individual genotype concordance rate for the 39 loci is 78 % (366/468 genotypes). The high genotype concordance rate suggests both datasets have high quality genotype calls for the shared loci.
Because our DNA samples include six parent-offspring trios, we can investigate the inheritance pattern and identify potential de novo SVA insertions in the offspring of each trio. To identify de novo SVA insertions, SVA insertions in each offspring that are found in parents or shared with unrelated individuals in the dataset (background) were removed. In total, 10 and 3 de novo insertion candidates were identified in the six offspring under the relaxed and stringent cutoffs, respectively. A close inspection showed that all candidate insertion loci are within old retrotransposons or simple repeats in the reference genome. The supporting flanking reads have low mapping quality in general because of the repetitive nature of these regions. Therefore these loci are unlikely to be authentic insertions. Consistent with this observation, two de novo insertion candidates failed validation (Additional file 6: Table S2). Given the SVA retrotransposition rate is estimated to be one in 916 births [39], in six trios the expected chance of identifying a de novo SVA insertion is < 0.01. Therefore, it is not surprising that we did not identify de novo SVA insertion in our dataset.

Potential functional impact of SVA insertions
Next we assessed the potential biological impact of SVA insertions. The insertion loci were intersected with gene annotations from the GENCODE project (Fig. 6). Given less than 5 % of the human genome are annotated as coding sequences (CDS, GENCODE v19), we expect the vast majority of insertions are located in intergenic or intronic regions, assuming a random insertion pattern. As expected, more than 93 % of SVA insertions are located in intergenic or intronic regions and only a small number of insertions overlap exonic regions: polymorphic SVA insertions identified under the relaxed condition intersected with four CDSs, six UTRs (untranslated regions), and one undefined exonic region (Fig. 6a,  left). Three of the four CDS insertions were also found in the novel polymorphic dataset, suggesting most exonic insertions identified in this study are novel (Fig. 6b, left). Stringent conditions produced similar results, with only one insertion intersected the CDS region (Fig. 6, right). SVA insertions overlapping CDSs are listed in Additional file 8: Table S5.
Given most polymorphic SVA insertions are in noncoding regions, we investigated the relationship between SVA insertions and epigenetic modifications. Using the 15 chromatin state profile from nine cell lines as defined by ChromHMM [41], we calculated the normalized number of SVA insertions in each state. The majority of polymorphic SVA insertions are enriched in non-or less-functional genomic regions, especially state 13 (heterochromatin, low signal), suggesting most of these insertions will not affect gene expression (Fig. 7).

Discussion
As the youngest retrotransposon family in the human genome, SVA insertions are highly polymorphic among human populations and play an important role in gene regulation and contribute to human diseases [19,[23][24][25][26][27][28]. However, the composite and complex structure of the SVA element has made it difficult to study the insertions using high-throughput sequencing. Here we described ME-Scan-SVA, a protocol for identifying polymorphic SVA insertions in a large number of samples.
Compare to RC-seq [10,[15][16][17], which uses a probe-based enrichment protocol to selectively enrich for SVAs, ME-Scan-SVA uses a two-round, nested SVA-specific PCR enrichment method. Unlike RC-seq which enriches for both ends of SVA insertions, ME-Scan-SVA only identify the flanking genomic region on the 5′ end of an SVA insertion. This design prevents us from identifying the TSDs of an SVA insertion without follow-up locus-specific sequencing. In addition, because ME-Scan-SVA is designed to preferentially amplify full-length insertions, we will not identify 5′ truncated SVAs that do not have the primer binding sites. Despite of these limitations, this PCR enrichment method has a high specificity:~94 % of the DNA fragments in the sequencing library passed the SVA Read filtering and are derived from SVA loci. An average 78 % of the total read-pairs passed both SVA Read and Flanking Read filters and we can determine the genomic locations of these potential SVA insertions (Additional file 1: Table S1). This high-specificity for SVA insertions allows us to pool a large number of individuals (e.g., 48) in one sequencing library to save the sequencing cost. Therefore, ME-Scan-SVA is particularly useful in projects that require cost-effective discovery of SVA insertions in a large number of samples.
Another potential future application of the ME-Scan-SVA method is to identify active SVA elements. SVA insertions can carry both 5′ and 3′ flanking sequences during their retrotransposition, in a process known as transduction [18,26]. The unique genomic sequence carried by the transduction event can be used to trace a new SVA insertion to the active SVA element where the insertion was generated [26]. With the current sequencing length (100 bps), we do not have sufficient flanking sequence to identify most transduction events. In the future, with long read sequencing technology we will be able to identify the transduction events using the ME-Scan-SVA protocol. Conclusions ME-Scan-SVA allows accurate and cost-effective SVA insertions discovery and genotyping. It can be applied in large-scale population studies. It also can be used to study endogenous somatic SVA retrotransposition events in different tissues or developmental stages. (CEU), three parent-offspring trios from Yoruba in Ibadan, Nigeria (YRI), and three additional YRI individuals. Information including population, family and individual relationships is shown in Table 1.

Library construction and sequencing
The ME-Scan-SVA libraries were prepared following the ME-Scan protocol described previously [35] with SVAspecific modifications. All the adaptor and primer sequences used in this study were synthesized by Integrated DNA Technologies (Coralville, IA, USA) and are shown in Additional file 9: Table S4. For each sample, 5 μg genomic DNA was randomly fragmented to about 1 kb in size using Covaris system (Covaris, Woburn, MA, USA) and concentrated using AMPure XP beads (cat. no. A63881, Beckman Coulter, Brea, CA, USA), following the manufacturer's protocol. The concentrated DNA fragments were then used to construct the sequencing library using KAPA Library Preparation Kits with SPRI solution for Illumina (KAPA Biosystems, Wilmington, MA, USA, cat. no KK8201).
DNA fragments were end-repaired, A-tailed on both ends following the kit protocol. The concentration of the A-tailed DNA was determined using a Nanodrop (Thermo Fisher Scientific, Wilmington, DE, USA). Atailed DNA fragments were then ligated with adaptors following the protocol of adaptor ligation of KAPA Library Preparation Kit. Each individual was characterized by a unique 6 bp index for downstream identification. The concentration of ligated DNA from each sample was quantified using Nanodrop and the 21 libraries were pooled into one single library with equal concentration. All of the following steps were performed using the pooled library.
SVA-specific first amplification was conducted for 10 cycles with 200 ng of template DNA and 2.5 μl of primer, following the library prep kit amplification protocol (initial denaturation at 98°C for 45 s, followed by the thermocycling conditions of 98°C for 15 s, 65°C for 30 s, and 72°C for 30 s, and a final extension at 72°C for 1 min). Size selection was performed on the amplified PCR product using 0.5X of PEG/NaCl SPRI Solution. After size selection, biotinylated SVA-enriched DNA fragments were magnetically separated from other genomic DNA fragments using 5 μl Dynabeads R M-270 Streptavidin (cat. no. 65305, Invitrogen, Life Technologies, Oslo, Norway) following the manufacturer's protocol. Second amplification was conducted for 12 cycles under the same condition as first amplification, with 24 μl of biotinylated SVA-enriched DNA as template in a 75 μl reaction. The amplified PCR product was electrophoresed at 120 volts for 90 min on a 2 % NuSieve R GTG R Agarose gel (cat. no. 50080, Lonza, Rockland, Maine, USA). Fragments around 500 bp were size selected and purified using Wizard SV Gel and PCR Clean-up system (cat. no. A9281, Promega, Madison, WI, USA).
Before the library was sequenced, its fragment size and concentration was determined using Bioanalyzer and quantitative PCR by the RUCDR Infinite Biologics (Piscataway, NJ, USA). The library was sequenced using the Illumina Hiseq 2000 with 100PE format at RUCDR Infinite Biologics.

Computational analysis
The computational analysis pipeline was constructed using a combination of bash and python codes. The codes are available at https://github.com/JXing-Lab/ME-SCAN-SVA/.
Briefly, ncbi-blast-2.2.28+ [37] was used to compare SVA sequence in the SVA Read to the SVA consensus sequence to generate BLAST bit-scores. BWA-MEM (ver. 0.7.5a) [38] was used to map Flanking Read against the human reference genome (hg19). Samtools-1.1 [42] were used to count the number of Flanking Read mapped to the human reference genome in each individual for TPM calculation. BEDTools (Ver. 2.16.2) [43] was used to cluster all mapped reads in a region and generate a list of candidate insertion loci for downstream analyses. Using customized python and bash codes, results from all applications were integrated into the current pipeline.
Known polymorphic loci were obtained from the Database of Retrotransposon Insertion Polymorphisms (dbRIP, [40]), HuRef genome [39], and the 1000 Genomes data [5,22]. Gene annotation was obtained from GENCODE (Release v19). Chromatin state profiles from nine cell lines were obtained from ChromHMM [44]. For each chromatin state, the normalized number of SVA insertions (number of insertions divided by total number of locations in each state) was calculated.

Genotyping PCR for validation
Three separate PCR reactions were performed for each of the 13 loci (11 polymorphic and 2 de novo candidates): one outside primer with two different internal primers (SVA_1 internal, and SVA_2 internal, Additional file 9: Table S4) in two reactions and external primer pair in one reaction (Additional file 5: Figure S2B). Because the 5′ end of an SVA element contains a (CTCCCT) n simple repeat region and an Alu region that shares homology with Alu elements, non-specific amplifications occurred at many loci. In these cases different DNA polymerases, annealing temperatures, PCR buffers (standard and high GC buffer), PCR additive betaine, and primer locations were attempted. However, for 7 loci (5 polymorphic, 2 de novo) no specific internal/external amplification was achieved. The PCRs were