Data of 10 SSR markers for genomes of homo sapiens and monkeys

In this data, we present 10 Simple Sequence Repeat(SSR) markers TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA which are extracted from the genomes of homo sapiens and monkeys using string matching mechanism [1]. All loci showed 4 Base Pair(bp) in allele size, indicating that there are some polymorphisms between individuals correlating to the number of SSR repeats that maybe useful for the detection of similarity among the genotypes. Collectively, these data show that the SSR extraction is a valuable method to illustrate genetic variation of genomes.


Experimental factors
were targeted. String matching process is applied on genomes of homo sapiens and monkeys. 10 SSR markers to be used in various detection purposes are extracted with this approach. Experimental features Each of the 10 SSR markers are extracted from genomes of homo sapiens and monkeys. All the 10 SSRs showed the 4 bp in allele size. These differences showed that there are some polymorphisms among the genomes to the number of SSR repeats. These data suggest that SSR extraction is an useful method for providing information for various detections.
Access to the raw sequencing data allows researchers to perform further bio-informatics analysis based on their own computational algorithms.

Data
10 SSR markers data which were extracted from genomes of Homo sapiens and Monkeys are shown in Table 1. The data presented here shows that the SSR extraction with string matching was very useful and was able to reveal variation of selected genome collections. These SSR markers can be

SSR extraction
In this paper all chromosomes of homo sapiens and monkeys(Callithrix jacchus, Chlorocebus sabaeus, Gorilla gorilla,Macaca fascicularis, Macacamulatta, Nomascus leucogenys, Pan troglodytes, Papio anubis and Pongo abelli) and the ten(TAGA, AGAA, GATA, TCTA, TCAT, GAAT, AGAT, CTTT, TATC, TCTG) SSRs are considered. SSRs are extracted from homo sapiens and monkeys using string matching approach. The string matching is a searching mechanism that searches the repeats in a given chromosomal file.
Search process: The chromosomes and SSRs are given to main function, then the main function calls the shift process by providing right most character of the SSRs. The shift position is returned to main function by the shift process. The search process compares character by character from both the directions until a complete match or mismatch occurs. If match occurs the successive occurrence of the pattern is searched. If the successive occurrence size is greater than 1 then the data is stored in database [1]. This process is continued for all the SSRs and for entire data in the chromosomes. The detailed description is given in [1].

Paternity identification with similarity measures [2]
In cases related to paternity tests, two or more persons might claim that a child is their biological son/daughter. In such cases, the genome sequence of the child as well as the persons can be compared to identify the similarity of the loci that is stored in the Tandem Repeat Database(TandemRepeatDB). The person having more similarity of the loci with the child DNA will be considered to be the actual biological father/mother.

Procedure:
Genome sequence of the child as well as the persons(A and B) is taken. The continuously occurred 10 loci's from child and persons (A and B) are extracted and stored in TandemRepeatDB using multiple pattern multiple(2 N ) shaft parallel string matching algorithms [3].
The loci from TandemRepeatDB are extracted. Correlation coefficient, Rank correlation coefficient and Cosine similarity measures are applied to measure the similarity between loci of child and persons(A and B). Similarity measures return the percentage of similarity between the loci of child and persons (A and B).
Using the similarity percentage, the similarity can be noticed in both the positive and negative terms.
Example of similarity between child and persons (A and B) is shown in Table 3.
In Table 3, correlation coefficient, rank correlation coefficient and cosine similarity measures show a positive correlation (1) between the child and person A, whereas between the child and person B show a positive correlation for all the three measures but it is very low compared to child and person A.

DNA finger printing
Performing pattern search in the entire genome of an organism in traditional approach i.e., using laboratory experiments is very time consuming. Even for a small part of a genome, the process will take several hours, moreover the related laboratory experiments are quite expensive. Due to the latest developments in genome sequencing, in the near future, a person can get their entire genome sequenced in a diagnostics centre just like the medical diagnostics. In this situation, the multiple pattern multiple(2 N ) shaft parallel string matching algorithms [3] will play a key role to search the loci in the person's genome and will return the occurrence positions, chromosome name, loci name etc., in a quick time and at no cost.
DNA finger printing-It is a method used to identify an individual from sample genome sequence by searching the patterns in the locations on all chromosomes.
DNA finger printing procedure: Genome sequences of one family members (father, mother, daughters and sons) are considered. The 10 loci in all the family members genomes are searched using multiple pattern multiple(2 N ) shaft parallel string matching algorithms [3].
If exact match occurs then successive logic is applied. If successive occurrence of the loci is found then its sample name, position, chromosome name, pattern and number of times of occurrence related to all family members genomes are stored in TandemRepeatDB [1] The loci of all family members are extracted from TandemRepeatDB, their position, chromosome name, pattern and number of times of occurrence is compared.

Table A5
The 10 SSR counts for all the chromosomes of homo_sapiens. TCAT  GAAT  AGAT  AGAA  GATA  TATC  CTTT  TCTG  TCTA  Table A6 The 10 SSR counts for all the chromosomes of macaca_fascicularis .   TAGA  TCAT  GAAT  AGAT  AGAA  GATA  TATC  CTTT  TCTG  TCTA  Table A7 The 10 SSR counts for all the chromosomes of macaca_mulatta. TCAT  GAAT  AGAT  AGAA  GATA  TATC  CTTT  TCTG  TCTA  Table A8 The 10 SSR counts for all the chromosomes of nomascus_leucogenys .   TAGA  TCAT  GAAT  AGAT  AGAA  GATA  TATC  CTTT  TCTG  TCTA  Table A9 The 10 SSR counts for all the chromosomes of pan_troglodytes. TCAT  GAAT  AGAT  AGAA  GATA  TATC  CTTT  TCTG  TCTA  Table A10 The 10 SSR counts for all the chromosomes of papio_anubis .   TAGA  TCAT  GAAT  AGAT  AGAA  GATA  TATC  CTTT  TCTG  TCTA  Table A11 The 10 SSR counts for all the chromosomes of pongo_abelli.