Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing

Peng, Dan; Zhang, Yinming; Ren, Han; Li, Haixia; Li, Ran; Shen, Xuefeng; Wang, Nana; Huang, Erwen; Wu, Riga; Sun, Hongyu

doi:10.1038/s41598-020-69137-1

Download PDF

Article
Open access
Published: 22 July 2020

Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing

Dan Peng^1,2^na1,
Yinming Zhang^1,2^na1,
Han Ren^1,2^na1,
Haixia Li^1,2,
Ran Li^1,2,
Xuefeng Shen^1,2,
Nana Wang^1,2,
Erwen Huang^1,2,
Riga Wu^1,2 &
…
Hongyu Sun^1,2

Scientific Reports volume 10, Article number: 12225 (2020) Cite this article

2454 Accesses
17 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Massively parallel sequencing (MPS) has rapidly become a promising method for forensic DNA typing, due to its ability to detect a large number of markers and samples simultaneously in a single reaction, and sequence information can be obtained directly. In the present study, two kinds of forensic genetic markers, short tandem repeat (STR) and identity-informative single nucleotide polymorphism (iiSNP) were analyzed simultaneously using ForenSeq DNA Signature Prep Kit, a commercially available kit on MPS platform. A total of 152 DNA markers, including 27 autosomal STR (A-STR) loci, 24 Y chromosomal STR (Y-STR) loci, 7 X chromosomal STR (X-STR) loci and 94 iiSNP loci were genotyped for 107 Tibetan individuals (53 males and 54 females). Compared with length-based STR typing methods, 112 more A-STR alleles, 41 more Y-STR alleles, and 24 more X-STR alleles were observed at 17 A-STRs, 9 Y-STRs, and 5 X-STRs using sequence-based approaches. Thirty-nine novel sequence variations were observed at 20 STR loci. When the flanking regions were also analyzed in addition to target SNPs at the 94 iiSNPs, 38 more alleles were identified. Our study provided an adequate genotype and frequencies data of the two types of genetic markers for forensic practice. Moreover, we also proved that this panel is highly polymorphic and informative in Tibetan population, and should be efficient in forensic kinship testing and personal identification cases.

Nanopore sequencing technology, bioinformatics and applications

Article 08 November 2021

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

Detection of ribonucleotides embedded in DNA by Nanopore sequencing

Article Open access 23 April 2024

Introduction

Several genetic markers have been introduced to forensic genetics to clarify the problems of kinship analysis and personal identification. Short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) are commonly used genetic markers in present forensic cases^1,2.

STRs, usually 2–6 bp in length, are commonly typed with the amplified fragment length polymorphism (Amp-FLP) strategy combining fluorescently labelled multiplex PCR and capillary electrophoresis (CE)³. Allele calling can thus be inferred from fragment length by comparison with a locus specific allelic ladder that has been previously sequenced, where the number of repeat units is distinct². Thus, each allele is regarded as a length-based (LB) allele using this approach. With the advancement of sequencing technologies over the last decade, the existence of sequence structure variations in alleles with the same length has been uncovered⁴.

SNPs, which could be amplified with smaller amplicons, are bi-allelic genetic markers with lower mutation rates compared with STRs⁵. Several autosomal SNP marker sets and detection methods, such as single-base extension, chip-based microarrays, and allele-specific hybridization arrays, have been developed to compensate for the relatively weaker discrimination power of single loci caused by the bi-allelic nature of the human genome^5,6,7. However, these methods are not widely used in forensic practice due to the requirement of higher DNA inputs or the limited ability to detect a vast number of SNP loci in a single reaction⁸.

Different from detection methods mentioned above, massively parallel sequencing (MPS), also known as next-generation sequencing (NGS), provides new technology for forensic genetic marker typing. Numerous markers and samples can be investigated simultaneously with MPS, and there is no need to consider the problem of the size overlapping of amplified fragments or the availability of fluorescent labels as CE method does. For STRs, both length and sequence data can be achieved; thus, allele calling may be more informative and the allele’s sequence characteristics are identified, resulting in sequence-based (SB) alleles. For SNPs, not only the target SNPs but also the variations in the flanking regions can be identified simultaneously, and form the potential microhaplotype^9,10. Thus, more alleles can be identified based on the analysis of full sequences of SNPs.

This new technology puts forward new challenges to researchers. First of all, the immense variable and complex data produced by MPS platforms is hard to be analysed manually. Meanwhile, the software packages developed for LB datasets are not efficient anymore. New bioinformatic methods are required to process and interpret these extensive data. An optimal package for MPS data analysis needs to be accurate, time-saving and easy to operate. Several packages has been published to make this process convenient for forensic uses, such as TSSV¹¹, STRait Razor^12,13, STRinNGS¹⁴, SEQ Mapper¹⁵, FDStools¹⁶ et al. Sequencer manufacturers also carried out supplementary analysis packages to fit for the data produced by their sequencers, such as ForenSeq Universal Analysis Software¹⁷ (UAS, Illumina, San Diego, CA) and Ion Torrent Suite Software Plugins¹⁸ (Thermo Fisher Scientific, South San Francisco, CA).

Moreover, the LB nomenclature of CE method for STR is not suitable for the complex sequence variations detected by MPS platforms. It is urgent to know how the MPS data should be analysed and reported, what connections do these data have with LB alleles, and how to record and search such datasets in a database⁴. Some researchers have tried to answer these questions^19,20 but a perfect nomenclature is still under development. A unified minimal nomenclature of the complex sequences obtained by MPS technologies was recommended by the International Society for Forensic Genetics (ISFG) in 2016^21,22 to facilitate communication between laboratories and to make this data backward compatible with LB data produced on CE platform. In early 2019, the STRAND Working Group was formalized to discuss the expanding and advancing topics of STR sequence nomenclature²³. Quality control of string sequences and alleles has also been suggested by ISFG²⁴.

Aiming to facilitate MPS in forensic genetics practice, several commercial and custom STR typing systems have been developed based on different MPS platforms for different purposes^25,26,27,28. The ForenSeq DNA Signature Prep Kit (Illumina, San Diego, CA) is one of the library preparation kits that simultaneously targets the sequences of Amelogenin, 27 autosomal STRs (A-STRs), 24 Y-STRs, 7 X-STRs, and 94 identity informative SNPs (iiSNPs) in Primer Mix A (DPMA), with the option to include an additional 56 ancestry informative SNPs (aiSNPs) and 22 phenotype informative SNPs (piSNPs) in Primer Mix B (DPMB). A pair-ended sequencing will be performed after the library preparation and then the raw data will be imported to UAS to analyse automatically. Validations using the ForenSeq Signature system have demonstrated its advantages in forensic practice relative to other library preparation kits^{29,30,31,32,33}, but the knowledge of alleles and genotype frequencies of these 58 STRs is still inadequate for accurate lineage analysis^{34,35,36,37,38,39} and is not sufficient for population genetic studies, which limits its utility in forensic casework.

The Tibetan ethnic group is one of the oldest ethnic groups in China and in South Asia, and the culture of ancient Tibet was thrived from the tenth to the sixteenth century. The Tibetan ethnic group includes a population of 6.3 million people according to the 2010 Chinese census and they have resided mainly on the highest plateau in the world, the Qinghai-Tibetan Plateau (average elevation ranges from 4,000 to 5,000 m), for hundreds of generations. They have a distinctive language, clothing, customs, religious characteristics from other Chinese or South Asian ethnic groups⁴⁰ and can be further classified as Wei Tibetan, Kangba Tibetan and Amdo Tibetan⁴¹ by linguistics. The diversities of LB STR alleles using CE methods have been reported in some studies^42,43, but the polymorphisms of SB STR and SNP alleles on MPS platforms using the Forenseq system have not been researched in this ethnic group.

In this study, a Tibetan population from Lhasa (Wei Tibetan), the capital of the Tibetan Autonomous Region, was analysed. Sequence variations in 58 STRs and 94 iiSNPs, and population data from these two kinds of markers were reported. Parameters of evidence weight were also performed.

Results

Sequencing quality and concordance analysis

The average cluster density was 740 k/mm², while the average total number of reads was 96,239 and 110,292 for each sample for the two runs (Supplementary Table S1). Concordance analysis of two software packages (UAS and STRait Razor v2s) showed the same allele calling based on length. The LB alleles were in concordance with corresponding CE results for the 23 shared STR loci from the Forenseq system and Goldeneye DNA ID System 25A amplification system (Peoplespot SciTech Incorporation, Beijing, China) except D22S1045. Allele imbalance was observed at D22S1045 and all of the ACRs from D22S1045 were lower than 0.50 (range from 0.0113 to 0.4918) (Fig. 1), which led to some miscalling of heterozygotes as homozygotes. Considering that allele genotypes of the D22S1045 locus were questionable, D22S1045 were discarded for the following statistics. Sequence-based average ACRs of the other 26 A-STRs ranged from 0.6996 (Penta E) to 0.9572 (TH01) (Fig. 1).

A-STRs dataset were verified and accepted by STRidER (https://strider.online/) with the assigned accession number STR000149²².

Novel alleles and STR allele sequence variations

Thirty-nine novel alleles were detected at 20 loci in the Tibetan population compared with the records in STR Sequencing Project (STRseq)⁴⁴ (Table 1). As shown in Supplementary Table S2 (descending order), a total of 353, 166 and 83 alleles were identified through sequence-based approaches, while 241, 125 and 59 alleles were identified by length-based approaches for A-STRs, Y-STRs and X-STRs, respectively. An increase in the allele number by sequencing was observed at 17 A-STRs, 11 Y-STRs, and 5 X-STRs, in which 10 STRs showed greater than 100% (from 100 to 216.67%) increases. In concordance with that of Wang et al.’s study of 58 Tibetans⁴⁵, STRs with increasing allele numbers were mainly compound and complex repeat STRs. We categorized the sequence variations into three groups, i.e., repeat region variants only (RRVO), flanking region variants only (FRVO) and repeat region plus flanking region variants (RRFR) (Fig. 2). We found that RRVO accounted for the largest number of variations that contributed to the increased number of alleles (Supplementary Table S2).

Table 1 Novel alleles observed in 107 Tibetan samples.

Full size table

Twenty-five SNPs and two InDels were detected in the flanking region of 21 STRs (Table 2), in which four SNPs and one deletion from four STRs (DYS437, Y-GATA-H4, DYS460, and DYS448) had not been previously reported in the 1,000 Genomes dataset (1,000 Genomes, https://genome.ucsc.edu/). The highest increased allele number due to flanking region variations were observed at D7S820, whose alleles with SNPs accounted for 80% of the total kinds of SB alleles. In addition, in the 214 alleles detected in D7S820, 94.39% were observed with flanking region SNPs. D13S317 presented the second highest increase with an FRV ratio of 62.50% and 55.14% of the alleles being observed with flanking SNPs.

Table 2 SNPs and InDels observed in STR flanking regions.

Full size table

Allele frequencies and population genetic parameters of STRs

Both LB and SB allele frequencies and other parameters for each STR locus are listed in Supplementary Tables S3–S8. The frequencies of X-STR alleles in females and males were analysed together since no significant differentiation was observed (p > 0.05). For autosomal loci and X-chromosome loci (females only), both LB and SB allele data met Hardy–Weinberg equilibrium (HWE) expectations after Bonferroni correction (A-STRs: α′ = 0.05/26, X-STRs: α′ = 0.05/7) (Supplementary Table S4). No significant linkage disequilibrium (LD) was detected in 26 A-STRs for both LB data and SB data after Bonferroni correction (α′ = 0.05/325) (Supplementary Table S5). For X-STRs and A-STRs, 20 and 23 pairs showed significant LD with SB and LB data for female samples (p < 0.05), respectively, and no LD was detected after Bonferroni correction (α′ = 0.05/528) (Supplementary Table S5).

Loci with increased number of alleles in SB data compared with LB data also showed gains in observed heterozygosities (H_obs) (Supplementary Table S6), which is a sign of an increase of genetic diversity at these loci. The top three loci that had the largest percentage of increase in H_obs were D3S1358 (17.15%), D2S441 (13.90%) and D5S818 (10.39%) successively.

As shown in Supplementary Table S7, a distinct increase in total discrimination power (TDP) and a decrease in cumulative random match probability (CMP) could be observed due to the increasing diversity of SB alleles. The CMP of SB approaches for 26 A-STRs were four orders of magnitude lower than those of LB approaches. The cumulative probability of exclusion of duos (CPE_duo) and trios (CPE_trio) using SB data were higher than those using LB data (Supplementary Table S8). The high strength of evidence indicated the reliability of the 26 A-STRs in both personal identification and parentage testing of duos and trios (Table 3).

Table 3 Combined forensic parameters of datasets used in this study.

Full size table

For Y-STRs loci, the value of genetic diversity (GD) ranged from 0.2046 (DYS391) to 0.8672 (DYS481) and 0.2046 (DYS391) to 0.9296 (DYF387S1) when using LB data and SB data, respectively. The increased percentage of GD values of SB data compared with that of LB data ranged from 0 to 77.08% (DYS437). A total of 50 haplotypes were observed in both LB and SB data, with a haplotype diversity (HD) of 0.9971 and 48 haplotypes were unique (0.96) (Supplementary Table S9).

Identity-informative SNPs

Forty-seven alleles with two or more SNPs within the full sequences combining the target SNPs and the flanking regions were observed at 31 iiSNP loci. Among the 47 alleles, one allele had four SNPs, six alleles had three SNPs, and the other 40 alleles had two SNPs (Supplementary Table S10). A total of 226 different sequence strings, or to say, alleles were observed based the analysis of full sequences at the 94 iiSNPs, and altogether 38 more alleles were identified compared with the analysis only based on target SNPs. Details of allele frequencies for each type of data were shown (Supplementary Table S11). The HWE test indicated that the 94 iiSNP loci (either based on target SNPs or full sequences) were in Hardy–Weinberg equilibrium after Bonferroni correction (α' = 0.05/94) (Supplementary Table S12). Five pairs and three pairs of loci showed LD after Bonferroni correction (α' = 0.05/4,371) when data based on target SNPs and data based on full sequences were considered, respectively (Supplementary Table S13). These loci were all positioned on different chromosomes, thus, we considered the 94 iiSNPs as independent for the following statistics.

The related forensic parameters for the SNPs were shown in Supplementary Table S14. The combined parameters for the data based on target SNPs and full sequences can be referred in Table 3. The strength of evidence was higher when adjacent flanking region variations of target SNPs in full sequences were taken into consideration. Observed heterozygosities showed improvements in data of full sequences compared with data of target SNPs (Supplementary Table S15). The effective number of alleles (Ae), an important and effective index for evaluation of the selection of microhaplotypes for mixture detection⁴⁶, also showed some level of increases in data based on full sequences when compared with the Ae of corresponding data of target SNPs only (Supplementary Table S15). For data of full sequences, ten loci had an Ae value > 2.00, of which rs1109037 and rs10776839 had values > 3.00, which was a necessary criterion for microhaplotypes being applied to mixture detection⁴⁶. For the target SNP data, in contrast, only five loci showed an Ae > 2.00.

When we combined the length-based data of 26 A-STRs with data from the target SNPs of 94 iiSNPs, eight of 7,140 pairwise comparisons still showed LD (p < 0.00001) after Bonferroni correction (α' = 0.05/7,140) (Supplementary Table S16). Regarding the combination of sequence-based data from 26 A-STRs and data from full sequences of 94 iiSNPs, the same number of pairwise comparisons showed LD (p < 0.00001) after Bonferroni correction (α' = 0.05/7,140) (Supplementary Table S16). None of these pairwise comparisons with significant LD were syntenic. Relative forensic parameters were shown in Table 3. The power for personal identity of the 94 iiSNPs was three to five magnitudes higher than the 26 A-STRs, while in the case of the ability of parentage testing of duos and trios, the 26 A-STRs were higher than the 94 iiSNPs. Moreover, when combining A-STR and SNP markers, the power of the system efficiency was much higher (about 30 to 35 magnitudes lower for CMP and four to ten times higher for CPE) than detection using one category of markers only.

Discussion

The Tibetan population described in this study exhibited many sequence variations in repeat regions and flanking regions based on MPS data. A total of 33 STRs showed a higher diversity of alleles when considering sequence variations rather than considering length-based alleles only, while 25 loci showed no increase in allele number by the SB method. Thirty-nine novel alleles were detected, although only 107 samples were studied. Twenty-five SNPs and two InDels were detected in the flanking regions of 21 STRs. InDels existing in the flanking regions of sequences may influence the length call definitions of alleles. Variants with a substantially differences in frequency distributions between different populations is an indicator of the ancestry-informative value of the locus⁴⁷. As for iiSNPs, compared with the alleles focused on target SNPs, 47 alleles with two or more SNPs within the full sequences combining both target SNPs and the flanking regions were observed at 31 iiSNP loci. Similar results of the heterozygote imbalance of D22S1045 were reported by Novroski³⁴, Churchill⁴⁸, Just³¹ and Hussing³⁸, and Hussing et al. also chose not to analyse this locus further in Danes³⁸. While in Novroski’s and Churchill’s reports, the number of SB alleles of D22S1045 were increased due to FR variations, but in Chinese populations (^45,49 and our study), no sequence variation (neither repeat nor flanking region) was observed at the D22S1045 locus. Overall, the sequence variations observed herein were consisted with the observations reported in previous literature^{34,37,39,45,50}.

Five pairs and three pairs of loci showed LD after Bonferroni correction when data of target SNPs and full sequences of iiSNPs were considered, respectively. Referring to previous similar studies^10,38,51,52, it was not surprising to observe the LD phenomenon for iiSNPs pairs. These loci were considered as independent when calculating forensic parameters since these iiSNP pairs were located on different chromosomes^38,52. We suppose it was the special population structure of the aimed population group that caused the disequilibrium. Meanwhile, considering the small sample size of this study, failure of LD testing may also result from random effect.

Aiming to correctly interpret the complex data produced by MPS platforms, a convenient and sophisticated software package for data analysis may promote the use of MPS platforms for this type of forensic genetic study. The two software packages we used here for concordance analysis have their own characteristics. As an offline software with customized web browser interface for forensic use, UAS v1.2.16173 can report the LB allele callings for STRs and genotypes of target SNPs, and can calculate RMP and TDP for specific populations. Although this version of UAS doesn’t support flanking region analysis, it has been improved and allowed the analysis in the upgraded version (UAS v1.3 or later). STRait Razor v2s can analyse more than 300 loci (including STRs, SNPs and InDels), and focus on both repeat regions and flanking regions of the investigated loci, and can report a SB allele in concordance with the minimal nomenclature requirements recommended by ISFG. Moreover, a much more informative form of SNPs (alleles based on full sequences) can be obtained using an MPS platform rather than alleles based on target SNPs only, which may facilitate mixture deconvolution in the future.

In order to adapt to the backward compatibility of the CE typing system, the nomenclature recommended by ISFG²¹ contains repeat number information based on allele length. Similar to the principles of the CE method, the ‘CE callings’ for SB alleles were determined by comparing the length of the fragment with the same structure length relative to a reference sequence. It is important to note that the CE callings may not represent the actual numbers of repeat units of an allele, especially in alleles with flanking region variations. The annotation of the flanking variants in the nomenclature can indicate the true status of a sequence, which is important for researchers to quickly determine the diversity of a given sequence. An InDel that exists in a flanking region may not be identified but can influence the length of a fragment. In this study, an InDel at D7S820 could explain the influence of InDels on nomenclature (Fig. 3). Through the sequence structure, a [T/TA] insertion (provisionally rs1463708262 (GRCh38 7:84,160,205–84,160,212)) was identified, which resulted in the allele length recognized by STRait Razor v2s as 1 nt longer than the length of allele 10. Thus, the CE number was termed 10.1. Meanwhile, a T-A transversion (rs7789995) was identified close to the insertion (GRCh38 7:84,160,204), which had allele frequencies of 99%, 91%, 94%, 87%, 92% in African, American, East Asian, European, South Asian, respectively (1,000 Genomes, https://genome.ucsc.edu/). Hence, the SB allele name of the string sequence was “D7S820 [CE10.1]-Chr7-GRCh38 84,160,226–84,160,277 [TATC]10 84,160,204-A; 84,160,204.1A”, regardless of if the actual number of repeat units was 10. Moreover, the exact position of the A insertion in the above example was ambiguous. The insertion may exist at any position from 84,160,204 to 84,160,212. Consistent with this observation, the possible InDels have not been defined in the reference template yet⁵³. Similar allele callings were observed in 11 loci (D7S820, D13S317, PentaD, DYS385, DYS460, DYS448, DXS10135, DXS10074, HPRTB and DXS7423). This type of inconsistency between alleles and sequences was reported by Novroski et al.³⁴

Moreover, discrepancies in SB allele nomenclature could be observed when using different coordinates of sequence guides and analysis tools. In previous studies of SB alleles, some researchers followed the nomenclature recommended by ISFG²¹, while others used custom-defined nomenclature^36,37. Discordant nomenclatures can lead to inconsistent allele calling between laboratories, further confusing precise allele and InDel calling between populations. An optimised consolidation of allele nomenclature and reference genome coordinates for SB alleles should be addressed for a convenient method for data communication between laboratories is urgently needed.

Lastly, a high system efficiency of the selected 58 STRs and 94 iiSNPs was demonstrated in this Tibetan population. MPS methods make the combined detection of STRs and SNPs more convenient, thereby improving the system efficiency dramatically. Cases involving personal identification and parentage testing of duos and trios can thus be solved with reliable results. The combination of detection of STRs and SNPs may help to solve problems in complex kinship analysis more efficiently. A lofty goal of this field would be reducing the number of markers (or removing loci with low diversity) when conducting duo and trio testings at an acceptable performance level. Furthermore, exploring customized marker subsets for different identification purposes is also an area of future interest.

Conclusions

This study investigated sequence polymorphisms of 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing, and provided an accurate sequence-based allele frequencies dataset for these loci. Distinct sequence variations were observed in both repeat and flanking regions of these loci, which indicated that the ForenSeq DNA Signature system is highly polymorphic and informative in the studied population. Our study also demonstrated a potential capability for this system to be applied in kinship testing and personal identification cases.

Materials and methods

Sample collection

Peripheral blood samples were collected using FTA cards from 107 unrelated individuals (53 males and 54 females) from Wei Tibetan population in the Tibetan Autonomous Region of western China.

Library preparation and sequencing

DNA libraries were constructed using the ForenSeq DNA Signature Prep Kit according to the manufacturer’s recommendations⁵⁴. Briefly, 1.2 mm diameters of FTA cards were punched directly as an input template without DNA extraction. Target amplification and tagging were performed under advised thermal cycling parameters. Index 1 (i7) and index 2 (i5) adapters were added for target enrichment purposes. Then the libraries were purified using Sample Purification Beads (SPB) and normalized using Library Normalization Beads 1 (LNB1). Finally, 5 μL of the normalized library from each sample was pooled into a single microcentrifuge tube. Seven microliters of pooled libraries were added into 591 μL Hybridization Buffer (HT1) and mixed with 2 μL of diluted Human Sequencing Control (HSC) mixture. Sequencing was performed on a MiSeq FGx instrument (Illumina, San Diego, CA) using the MiSeq FGx Reagent Kit (Illumina, San Diego CA) following the manufacturer’s instructions. Two runs were performed to cover all samples in this study.

Data analysis, allele nomenclature, and sequence variant identification

The raw sequencing data of STRs was first analysed using ForenSeq UAS (v1.2.16173) with default analytical and interpretation thresholds (AT = 1.5%, IT = 4.5% in general, respectively) for allele calling¹⁷. The intralocus balance threshold was measured as the intensity (number of reads) of the minimum intensity typed allele divided by the intensity of the maximum intensity typed allele and was set as 0.60 (default setting) for all loci except for D22S1045 (intralocus balance threshold = 0.10), which was suggested to be analysed with caution by the manufacture due to the extreme heterozygote imbalance. Alleles were reported using length calling and sequence calling, which contain the repeat region of the locus.

The actual ACR was determined for heterozygote loci by dividing the lower number of reads by the higher number of reads. Then, the FASTQ files were re-analysed by STRait Razor v2s¹² software with the default analytical threshold (AT = 2 reads) and heterozygote threshold (HT = 0.40, the same meaning with intralocus balance threshold). In-house workbooks (written by VBA using Microsoft Excel) were developed to modify the format of the nomenclature produced by STRait Razor v2s so as to completely conform to the requirements recommended by ISFG²¹ and the revised Forensic STR Sequence Guide_v3⁵³. Manual corrections were also performed to verify the nomenclature of SB alleles identified as ‘novel’ by STRait Razor v2s. The 94 iiSNPs were genotyped using ForenSeq UAS with default settings (AT = 1.5%, IT = 4.5%), from which we obtained the alleles and genotypes according to the target SNPs. Comprehensive nomenclature following Parson et al.²¹ were obtained using STRait Razor v2s¹², from which we obtained the alleles and genotypes considering the full sequences of SNPs.

The allele sequence variants of STRs were classified into three categories: repeat region variants only (RRVO), flanking region variants only (FRVO), and repeat region plus flanking region variants (RRFR) (Fig. 4). Reference alleles were defined using the STRBase database (https://strbase.nist.gov/str). Novel alleles were newly discovered if they had not been previously reported in the STR Sequencing Project (STRseq, https://www.ncbi.nlm.nih.gov/bioproject/PRJNA380127, accessed: 18 October 2018)⁴⁴. The SNPs and InDels in flanking regions were compared to the UCSC Genome Browser (1,000 Genomes, https://genome.ucsc.edu/) and were also verified in the NCBI database (dbSNP, 152 build, https://www.ncbi.nlm.nih.gov/snp/).

The genotyping data of A-STRs and corresponding string sequences were submitted to STRidER (https://strider.online/) for quality control²².

Capillary electrophoresis and concordance analysis

All samples were genotyped using the Goldeneye DNA ID System 25A amplification system. The system contained the 20 expanded Combined DNA Index System (CODIS) core loci plus Penta E, Penta D, D6S1043, a Y indel and Amelogenin for sex identification. DNA amplification was performed according to the manufacturer’s instructions. The PCR products were detected using CE on an ABI 3500xL Genetic Analyzer (Applied Biosystems, USA). The results were analysed with GeneMapper ID-X Analysis Software (Applied Biosystems, USA). Concordance analysis was performed between the LB alleles produced by Miseq and the corresponding CE results. The comparison of genotyping results from UAS and STRait Razor v2s was also performed.

Allele frequencies and forensic parameters

A counting method was utilized to obtain the LB and SB allele frequencies. For A-STRs, a HWE test and pairwise LD analysis was performed using Arlequin 3.5.2.2⁵⁵. The expected and observed heterozygosity (H_exp, H_obs), polymorphism information content (PIC), discrimination power (DP), random match probability (RMP), and power of exclusion (PE) for both duos and trios of the 27 A-STRs were calculated with Cervus 3.0.7⁵⁶. For X-STR loci, the differentiation of the X-STR allele frequency distribution of females and males was performed using Arlequin 3.5.2.2⁵⁵. HWE test and LD analysis were performed for X-STRs of females. The mean exclusion chance for father/daughter duos (MEC_duo) and father/mother/daughter trios (MEC_trio) were calculated on the ChrX-STR.org.2.0 website (https://chrx-str.org/)⁵⁷. Finally, an LD test for the 27 A-STRs combined the 7 X-STRs of females was performed using Arlequin 3.5.2.2⁵⁵. Relevant Y-STR parameters, which included genetic diversity (GD), haplotype diversity (HD), allele frequencies and haplotype frequencies, were calculated using an in-house workbook (written by VBA using Microsoft Excel). The formulas were as follows:

$${\text{GD}} = [{\text{n}}*({1} - \sum {\text{p}}_{i}^{2} )]/\left( {{\text{n}} - {1}} \right),$$

(1)

$${\text{HD}} = [{\text{N}}*({1} - \sum {\text{p}}_{j}^{{2}} )]/\left( {{\text{N}} - {1}} \right),$$

(2)

where n represents the number of alleles, p_i represents the allele frequency, N represents the number of haplotypes, and p_j represents the haplotype frequency.

HWE and LD for both target SNPs and full sequences of iiSNPs were performed using Arlequin 3.5.2.2⁵⁵. Cervus 3.0.7⁵⁶ was used to calculate allele/full sequence frequencies, H_exp, H_obs, PIC, DP, RMP, PE_duo and PE_trio. The effective number of alleles (A_e) was defined as the reciprocal of the homozygosity:

$${\text{A}}_{{\text{e}}} = {1}/\sum {\text{p}}_{i}^{2} ,$$

(3)

where p_i represents the frequency of the ith allele according to Kidd⁴⁶. We also evaluated the performance of combination of LB A-STRs and target iiSNPs, SB A-STRs and full sequence of iiSNPs with the same methods.

Ethics statement

All of the experimental process in this study were strictly followed the ethical research principles, and all methods following were performed in accordance with the relevant guidelines and regulations. All samples were anonymously collected after informed consent was obtained. This study was approved by the Ethics Committee of Sun Yat-sen University with the approval number of No. 11[2012].

Data availability

The datasets generated and analysed during the current study are available from the corresponding author on reasonable request.

References

Cornelis, S., Gansemans, Y., Deleye, L., Deforce, D. & Van Nieuwerburgh, F. Forensic SNP genotyping using nanopore MinION sequencing. Sci. Rep. 7, 41759. https://doi.org/10.1038/srep41759 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Butler, J. M. Forensic DNA typing: Biology, technology, and genetics of STR markers. 2nd edn. (London, 2005).
Bar, W. et al. DNA recommendations. Further report of the DNA Commission of the ISFH regarding the use of short tandem repeat systems. International Society for Forensic Haemogenetics. Int. J. Legal Med. 110, 175–176 (1997).
Article CAS Google Scholar
Gettings, K. B., Aponte, R. A., Vallone, P. M. & Butler, J. M. STR allele sequence variation: Current knowledge and future issues. Forensic Sci. Int. Genet. 18, 118–130. https://doi.org/10.1016/j.fsigen.2015.06.005 (2015).
Article CAS PubMed Google Scholar
Kidd, K. K. et al. Developing a SNP panel for forensic identification of individuals. Forensic Sci. Int. 164, 20–32. https://doi.org/10.1016/j.forsciint.2005.11.017 (2006).
Article CAS PubMed Google Scholar
Sanchez, J. J. et al. A multiplex assay with 52 single nucleotide polymorphisms for human identification. Electrophoresis 27, 1713–1724. https://doi.org/10.1002/elps.200500671 (2006).
Article CAS PubMed Google Scholar
Sobrino, B., Brion, M. & Carracedo, A. SNPs in forensic genetics: A review on SNP typing methodologies. Forensic Sci. Int. 154, 181–194. https://doi.org/10.1016/j.forsciint.2004.10.020 (2005).
Article CAS PubMed Google Scholar
Seo, S. B. et al. Single nucleotide polymorphism typing with massively parallel sequencing for human identification. Int. J. Legal Med. 127, 1079–1086. https://doi.org/10.1007/s00414-013-0879-7 (2013).
Article PubMed Google Scholar
Oldoni, F. et al. Microhaplotype loci are a powerful new type of forensic marker. Forensic Sci. Int. Genet. Suppl. Ser. 4, e123–e124. https://doi.org/10.1016/j.fsigen.2018.09.009 (2013).
Article CAS Google Scholar
King, J. L. et al. Increasing the discrimination power of ancestry- and identity-informative SNP loci within the ForenSeq DNA Signature Prep Kit. Forensic Sci. Int. Genet. 36, 60–76. https://doi.org/10.1016/j.fsigen.2018.06.005 (2018).
Article CAS PubMed Google Scholar
Anvar, S. Y. et al. TSSV: A tool for characterization of complex allelic variants in pure and mixed genomes. Bioinformatics 30, 1651–1659. https://doi.org/10.1093/bioinformatics/btu068 (2014).
Article CAS PubMed Google Scholar
King, J. L., Wendt, F. R., Sun, J. & Budowle, B. STRait Razor v2s: Advancing sequence-based STR allele reporting and beyond to other marker systems. Forensic Sci. Int. Genet. 29, 21–28. https://doi.org/10.1016/j.fsigen.2017.03.013 (2017).
Article CAS PubMed Google Scholar
Woerner, A. E., King, J. L. & Budowle, B. Fast STR allele identification with STRait Razor 30. Forensic Sci. Int. Genet. 30, 18–23. https://doi.org/10.1016/j.fsigen.2017.05.008 (2017).
Article CAS PubMed Google Scholar
Friis, S. L., Buchard, A., Rockenbauer, E., Borsting, C. & Morling, N. Introduction of the Python script STRinNGS for analysis of STR regions in FASTQ or BAM files and expansion of the Danish STR sequence database to 11 STRs. Forensic Sci. Int. Genet. 21, 68–75. https://doi.org/10.1016/j.fsigen.2015.12.006 (2016).
Article CAS PubMed Google Scholar
Lee, J. C., Tseng, B., Chang, L. K. & Linacre, A. SEQ Mapper: A DNA sequence searching tool for massively parallel sequencing data. Forensic Sci. Int. Genet. 26, 66–69. https://doi.org/10.1016/j.fsigen.2016.10.006 (2017).
Article CAS PubMed Google Scholar
Hoogenboom, J. et al. FDSTools: A software package for analysis of massively parallel sequencing data with the ability to recognise and correct STR stutter and other PCR or sequencing noise. Forensic Sci. Int. Genet. 27, 27–40. https://doi.org/10.1016/j.fsigen.2016.11.007 (2016).
Article CAS PubMed Google Scholar
ForenSeq universal analysis software guide. (2015) Available at: https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/forenseq-universal-analysis-software/forenseq-universal-analysis-software-guide-15053876-c.pdf (Accessed: 3 September 2017).
Ion torrent suite software plugins. (2014) Available at: https://www.thermofisher.com/cn/zh/home/life-science/sequencing/next-generation-sequencing/ion-torrent-next-generation-sequencing-workflow/ion-torrent-next-generation-sequencing-data-analysis-workflow/ion-torrent-suite-software-plugins.html (Accessed: 12 September 2019).
Just, R. S. & Irwin, J. A. Use of the LUS in sequence allele designations to facilitate probabilistic genotyping of NGS-based STR typing results. Forensic Sci. Int. Genet. 34, 197–205. https://doi.org/10.1016/j.fsigen.2018.02.016 (2018).
Article CAS PubMed Google Scholar
Young, B., Faris, T. & Armogida, L. A nomenclature for sequence-based forensic DNA analysis. Forensic Sci. Int. Genet. 42, 14–20. https://doi.org/10.1016/j.fsigen.2019.06.001 (2019).
Article CAS PubMed Google Scholar
Parson, W. et al. Massively parallel sequencing of forensic STRs: Considerations of the DNA commission of the International Society for Forensic Genetics (ISFG) on minimal nomenclature requirements. Forensic Sci. Int. Genet. 22, 54–63. https://doi.org/10.1016/j.fsigen.2016.01.009 (2016).
Article CAS PubMed Google Scholar
Bodner, M. et al. Recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on quality control of autosomal Short Tandem Repeat allele frequency databasing (STRidER). Forensic Sci. Int. Genet. 24, 97–102. https://doi.org/10.1016/j.fsigen.2016.06.008 (2016).
Article CAS PubMed Google Scholar
Gettings, K. B. et al. Report from the STRAND Working Group on the 2019 STR sequence nomenclature meeting. Forensic Sci. Int. Genet. 43, 102165. https://doi.org/10.1016/j.fsigen.2019.102165 (2019).
Article CAS PubMed Google Scholar
Gusmão, L. et al. Revised guidelines for the publication of genetic population data. Forensic Sci. Int. Genet. 30, 160–163. https://doi.org/10.1016/j.fsigen.2017.06.007 (2017).
Article CAS PubMed Google Scholar
Fordyce, S. L. et al. Second-generation sequencing of forensic STRs using the Ion Torrent HID STR 10-plex and the Ion PGM. Forensic Sci. Int. Genet. 14, 132–140. https://doi.org/10.1016/j.fsigen.2014.09.020 (2015).
Article CAS PubMed Google Scholar
Wendt, F. R., Zeng, X., Churchill, J. D., King, J. L. & Budowle, B. Analysis of short tandem repeat and single nucleotide polymorphism loci from Single-Source samples using a custom HaloPlex target enrichment system panel. Am J Forensic Med Pathol. 37, 99–107. https://doi.org/10.1097/PAF.0000000000000228 (2016).
Article PubMed Google Scholar
Kim, E. H. et al. Massively parallel sequencing of 17 commonly used forensic autosomal STRs and amelogenin with small amplicons. Forensic Sci. Int. Genet. 22, 1–7. https://doi.org/10.1016/j.fsigen.2016.01.001 (2016).
Article CAS PubMed Google Scholar
Zhang, S. et al. Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing. Sci Rep. 8, 6810. https://doi.org/10.1038/s41598-018-24495-9 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Silvia, A. L., Shugarts, N. & Smith, J. A preliminary assessment of the ForenSeq FGx System: Next generation sequencing of an STR and SNP multiplex. Int. J. Legal Med. 131, 73–86. https://doi.org/10.1007/s00414-016-1457-6 (2017).
Article PubMed Google Scholar
Jager, A. C. et al. Developmental validation of the MiSeq FGx forensic genomics system for targeted next generation sequencing in forensic DNA casework and database laboratories. Forensic Sci. Int. Genet. 28, 52–70. https://doi.org/10.1016/j.fsigen.2017.01.011 (2017).
Article CAS PubMed Google Scholar
Just, R. S., Moreno, L. I., Smerick, J. B. & Irwin, J. A. Performance and concordance of the ForenSeq system for autosomal and Y chromosome short tandem repeat sequencing of reference-type specimens. Forensic Sci. Int. Genet. 28, 1–9. https://doi.org/10.1016/j.fsigen.2017.01.001 (2017).
Article CAS PubMed Google Scholar
Fattorini, P. et al. Performance of the ForenSeq DNA Signature Prep kit on highly degraded samples. Electrophoresis 38, 1163–1174. https://doi.org/10.1002/elps.201600290 (2017).
Article CAS PubMed Google Scholar
Kocher, S. et al. Inter-laboratory validation study of the ForenSeq DNA Signature Prep Kit. Forensic Sci. Int. Genet. 36, 77–85. https://doi.org/10.1016/j.fsigen.2018.05.007 (2018).
Article CAS PubMed Google Scholar
Novroski, N. M., King, J. L., Churchill, J. D., Seah, L. H. & Budowle, B. Characterization of genetic sequence variation of 58 STR loci in four major population groups. Forensic Sci. Int. Genet. 25, 214–226. https://doi.org/10.1016/j.fsigen.2016.09.007 (2016).
Article CAS PubMed Google Scholar
Wendt, F. R. et al. Flanking region variation of ForenSeq DNA Signature Prep Kit STR and SNP loci in Yavapai Native Americans. Forensic Sci. Int. Genet. 28, 146–154. https://doi.org/10.1016/j.fsigen.2017.02.014 (2017).
Article CAS PubMed Google Scholar
Casals, F. et al. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations. Forensic Sci. Int. Genet. 30, 66–70. https://doi.org/10.1016/j.fsigen.2017.06.006 (2017).
Article CAS PubMed Google Scholar
Devesse, L. et al. Concordance of the ForenSeq system and characterization of sequence-specific autosomal STR alleles across two major population groups. Forensic Sci. Int. Genet. 34, 57–61. https://doi.org/10.1016/j.fsigen.2017.10.012 (2018).
Article CAS PubMed Google Scholar
Hussing, C., Bytyci, R., Huber, C., Morling, N. & Borsting, C. The Danish STR sequence database: Duplicate typing of 363 Danes with the ForenSeq DNA Signature Prep Kit. Int. J. Legal Med. 133, 325–334. https://doi.org/10.1007/s00414-018-1854-0 (2019).
Article CAS PubMed Google Scholar
Khubrani, Y. M., Hallast, P., Jobling, M. A. & Wetton, J. H. Massively parallel sequencing of autosomal STRs and identity-informative SNPs highlights consanguinity in Saudi Arabia. Forensic Sci. Int. Genet. 43, 102164. https://doi.org/10.1016/j.fsigen.2019.102164 (2019).
Article CAS PubMed Google Scholar
Ye, Y., Gao, J., Fan, G., Liao, L. & Hou, Y. Population genetics for 23 Y-STR loci in Tibetan in China and confirmation of DYS448 null allele. Forensic Sci. Int. Genet. 16, e7–e10. https://doi.org/10.1016/j.fsigen.2014.11.018 (2015).
Article CAS PubMed Google Scholar
Li, Z., Zhang, J., Zhang, H., Lin, Z. & Ye, J. Genetic polymorphisms in 18 autosomal STR loci in the Tibetan population living in Tibet Chamdo. Southwest China. Int. J. Legal Med. 132, 733–734. https://doi.org/10.1007/s00414-017-1740-1 (2018).
Article PubMed Google Scholar
He, G. et al. Genetic structure and forensic characteristics of Tibeto-Burman-speaking U-Tsang and Kham Tibetan Highlanders revealed by 27 Y-chromosomal STRs. Sci Rep. 9, 7739. https://doi.org/10.1038/s41598-019-44230-2 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
He, G. et al. Genetic diversity and phylogenetic characteristics of Chinese Tibetan and Yi minority ethnic groups revealed by non-CODIS STR markers. Sci Rep. 8, 5895. https://doi.org/10.1038/s41598-018-24291-5 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Gettings, K. B. et al. STRSeq: A catalog of sequence diversity at human identification Short Tandem Repeat loci. Forensic Sci. Int. Genet. 31, 111–117. https://doi.org/10.1016/j.fsigen.2017.08.017 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, Z., Wang, L., Liu, J., Ye, J. & Hou, Y. Characterization of sequence variation at 30 autosomal STRs in Chinese Han and Tibetan populations. Electrophoresis https://doi.org/10.1002/elps.201900278 (2020).
Article PubMed Google Scholar
Kidd, K. K. & Speed, W. C. Criteria for selecting microhaplotypes: Mixture detection and deconvolution. Investig Genet. 6, 1. https://doi.org/10.1186/s13323-014-0018-3 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gettings, K. B., Borsuk, L. A., Steffen, C. R., Kiesler, K. M. & Vallone, P. M. Sequence-based US population data for 27 autosomal STR loci. Forensic Sci. Int. Genet. 37, 106–115. https://doi.org/10.1016/j.fsigen.2018.07.013 (2018).
Article CAS PubMed PubMed Central Google Scholar
Churchill, J. D., Schmedes, S. E., King, J. L. & Budowle, B. Evaluation of the Illumina Beta Version ForenSeq DNA Signature Prep Kit for use in genetic profiling. Forensic Sci. Int. Genet. 20, 20–29. https://doi.org/10.1016/j.fsigen.2015.09.009 (2016).
Article CAS PubMed Google Scholar
Wang, Z. et al. Massively parallel sequencing of 32 forensic markers using the Precision ID GlobalFiler NGS STR Panel and the Ion PGM System. Forensic Sci. Int. Genet. 31, 126–134. https://doi.org/10.1016/j.fsigen.2017.09.004 (2017).
Article CAS PubMed Google Scholar
Wu, R. et al. Genetic polymorphism and population structure of Torghut Mongols and comparison with a Mongolian population 3000 kilometers away. Forensic Sci. Int. Genet. 42, 235–243. https://doi.org/10.1016/j.fsigen.2019.07.017 (2019).
Article CAS PubMed Google Scholar
Guo, F. et al. Next generation sequencing of SNPs using the HID-Ion AmpliSeq Identity Panel on the Ion Torrent PGM platform. Forensic Sci. Int. Genet. 25, 73–84. https://doi.org/10.1016/j.fsigen.2016.07.021 (2016).
Article CAS PubMed Google Scholar
Wendt, F. R. et al. Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGx forensic genomics system. Forensic Sci. Int. Genet. 24, 18–23. https://doi.org/10.1016/j.fsigen.2016.05.008 (2016).
Article CAS PubMed Google Scholar
Phillips, C. et al. “The devil’s in the detail”: Release of an expanded, enhanced and dynamically revised forensic STR Sequence Guide. Forensic Sci. Int. Genet. 34, 162–169. https://doi.org/10.1016/j.fsigen.2018.02.017 (2018).
Article CAS PubMed Google Scholar
ForenSeq DNA signature prep guide. (2015) Available at: https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/forenseq/forenseq-dna-signature-prep-guide-15049528-01.pdf (Accessed: April 29 September 2015).
Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: A new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567. https://doi.org/10.1111/j.1755-0998.2010.02847.x (2010).
Article PubMed Google Scholar
Kalinowski, S. T., Taper, M. L. & Marshall, T. C. Revising how the computer program CERVUS accommodates genotyping error increases success in paternity assignment. Mol. Ecol. 16, 1099–1106. https://doi.org/10.1111/j.1365-294X.2007.03089.x (2007).
Article PubMed Google Scholar
Desmarais, D., Zhong, Y., Chakraborty, R., Perreault, C. & Busque, L. Development of a highly polymorphic STR marker for identity testing purposes at the human androgen receptor gene (HUMARA). J. Forensic Sci. 43, 1046–1049 (1998).
Article CAS Google Scholar

Download references

Acknowledgements

This study was funded by the National Key Research and Development Program of China (2017YFC0803502) and National Natural Science Foundation of China (81671873, 81801878).

Author information

These authors contributed equally: Dan Peng, Yinming Zhang and Han Ren.

Authors and Affiliations

Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-Sen University, No. 74 Zhongshan Road II, Guangzhou, 510080, Guangdong, People’s Republic of China
Dan Peng, Yinming Zhang, Han Ren, Haixia Li, Ran Li, Xuefeng Shen, Nana Wang, Erwen Huang, Riga Wu & Hongyu Sun
Guangdong Province Translational Forensic Medicine Engineering Technology Research Center, Sun Yat-Sen University, Guangzhou, 510080, People’s Republic of China
Dan Peng, Yinming Zhang, Han Ren, Haixia Li, Ran Li, Xuefeng Shen, Nana Wang, Erwen Huang, Riga Wu & Hongyu Sun

Authors

Dan Peng
View author publications
You can also search for this author in PubMed Google Scholar
Yinming Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Han Ren
View author publications
You can also search for this author in PubMed Google Scholar
Haixia Li
View author publications
You can also search for this author in PubMed Google Scholar
Ran Li
View author publications
You can also search for this author in PubMed Google Scholar
Xuefeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Nana Wang
View author publications
You can also search for this author in PubMed Google Scholar
Erwen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Riga Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Sun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: H.S. and R.W. Sample collection: H.R. and X.S. Experiment: D.P., H.R., R.L., N.W. and R.W. Data analysis: D.P., Y.Z., H.L. and R.L. Figures and tables: D.P. and H.R. Manuscript writing: D.P. Manuscript modification: H.R., Y.Z., R.L., E.H. and H.S. All authors have reviewed the manuscript.

Corresponding authors

Correspondence to Erwen Huang, Riga Wu or Hongyu Sun.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Peng, D., Zhang, Y., Ren, H. et al. Identification of sequence polymorphisms at 58 STRs and 94 iiSNPs in a Tibetan population using massively parallel sequencing. Sci Rep 10, 12225 (2020). https://doi.org/10.1038/s41598-020-69137-1

Download citation

Received: 19 January 2020
Accepted: 16 June 2020
Published: 22 July 2020
DOI: https://doi.org/10.1038/s41598-020-69137-1

This article is cited by

Population genetic analyses of Eastern Chinese Han nationality using ForenSeq™ DNA Signature Prep Kit
- Ruiyang Tao
- Xinyu Dong
- Chengtao Li
Molecular Genetics and Genomics (2024)
Development and validation of a novel 133-plex forensic STR panel (52 STRs and 81 Y-STRs) using single-end 400 bp massive parallel sequencing
- Haoliang Fan
- Lingxiang Wang
- Shao-Qing Wen
International Journal of Legal Medicine (2022)
Characterization of 58 STRs and 94 SNPs with the ForenSeq™ DNA signature prep kit in Mexican-Mestizos from the Monterrey city (Northeast, Mexico)
- José Alonso Aguilar-Velázquez
- Miguel Ángel Duran-Salazar
- Héctor Rangel-Villalobos
Molecular Biology Reports (2022)
Sequence variations, flanking region mutations, and allele frequency at 31 autosomal STRs in the central Indian population by next generation sequencing (NGS)
- Hirak Ranjan Dash
- Kamlesh Kaitholia
- Surajit Das
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Sequencing quality and concordance analysis

Novel alleles and STR allele sequence variations

Allele frequencies and population genetic parameters of STRs

Identity-informative SNPs

Discussion

Conclusions

Materials and methods

Sample collection

Library preparation and sequencing

Data analysis, allele nomenclature, and sequence variant identification

Capillary electrophoresis and concordance analysis

Allele frequencies and forensic parameters

Ethics statement

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links