National Toxicology Program: landmarks and the road ahead.

The National Toxicology Program (NTP), a cross-agency unit of the Department of Health and Human Services (DHHS), is one of the focal points for government efforts aimed at generating, collecting, and coordinating data used for guiding public health decisions. Now 25 years old, the NTP is in the middle of a strategic planning effort to define how it will integrate new technologies with classical toxicological approaches to continue providing, according to the NTP’s motto, “good science for good decisions.” 
 
In its current form, the NTP integrates activities from the NIEHS, the National Institute for Occupational Safety and Health (NIOSH), and the Food and Drug Administration (FDA) National Center for Toxicological Research (NCTR). Its mission is to evaluate agents of public health concern by developing and applying tools of modern toxicology and molecular biology, a mission it achieves through the use of several strategies. It designs studies on potential toxicants and works with outside groups and government labs to carry them out. It reviews and evaluates what’s missing in understanding environmentally induced diseases. It then seeks to fill those gaps by collaborating and cooperating with federal agencies and with other domestic and foreign toxicology and public health organizations, and by carrying out research itself on human exposure and toxicity at laboratories housed at NIOSH and the NIEHS. It supports grants, contracts, and interagency agreements made through the NIEHS Division of Extramural Research and Training, and it supports the activities of three centers. 
 
These activities have brought immense gains in knowledge for the scientific community. But there is more work still to do. Since the fall of 2003, the NTP has been engaged in developing and seeking public comment on a “roadmap” to guide the program’s route forward over the next decade. The roadmap seeks to build on recent technological advances that will allow toxicology to evolve from a largely observational science to one that is more predictive, and thus in many ways more protective of public health.


Introduction
The recent completion of the human genome sequence provides a starting point for understanding genetic complexity and elucidating genetic variations contributing to diverse traits and diseases.Pigs are even-toed ungulates belonging to the order artiodactyla, an order phylogenetically closer to primates than rodentia [1].A separate suborder, the suina includes hippopotamuses, peccaries and pigs.All pigs are members of the suidae family.The pig is of particular interest in evolutionary studies not only because existing pig breeds show great phenotypic varieties for morphological, physiological and behavior traits but also because the wild ancestors of domesticated pigs and a convenient number of outgroup species are still present in the world.The pig (S. scrofa domesticus) was domesticated from S. scrofa, a wild boar, approximately 9,000 years ago in multiple regions of the world [2][3][4].These domestication events were separated not only by 1000s of kilometers but also by 1000s of years.During the past decade, there has been an increasing interest in detecting genes and genomic regions in human and other organisms.Domestic animal species have experienced strong selective pressures directed at genes or genomic regions controlling traits of biological, agricultural, or medical importance following their domestication and subsequent episodes of selective breeding.Consequently, these genes or genomic regions are expected to exhibit signatures of selective breeding.Pigs offer a unique opportunity to identify genes or genomic regions encoding quantitative trait loci (QTLs) since they have been through recent and strong selective sweeps targeted at phenotypes to improve agricultural performance and disease resistance.
The pig whole genome sequencing project has been launched in the early of 2006 initiated by the Swine Genome Sequencing Consortium (SGSC) (http://www.piggenome.org/).In addition to providing important evolutionary information, the availability of the pig whole genome sequence will contribute toward revealing the molecular mechanisms controlling phenotypes and play an increasingly significant role in pork production, by integrating 'omics' techniques and bioinformatics tools to reduce the incidence of disease and respond more rapidly to the changing demands of consumers.

Pig genetic resources
S. scrofa is one of the most globally widespread mammalian species.It has long been assumed that the force driving evolution was domestication and natural selection.Domestic pigs are found in a globally wide range of environments.Several features, including teeth and skull morphology, external proportions, hair and colour patterns, biochemical and molecular polymorphisms, ecology and behaviour, reproductive isolation and natural areas, are used for discriminating the many species in the genus Sus. S. scrofa is classed into a large number of subspecies, but the number is uncertain and depends on the definition of the subspecies.It has been possible to discriminate more than 16 distinct subspecies, each occupying distinct geographical areas [5][6][7][8].

Pig domestication
Domestication is the process of genetically adapting a wild biological organism to better suit the needs of human beings, as a result of living and breeding conditions under careful human control for multiple generations [9].Pig domestication has been an integral part of the rise of agriculture and the adoption of the agricultural practices throughout much of the world.Insights into the evolution and spread of the pig are likely to deepen our understanding of the origins and spread of livestock agriculture and the rise of early human civilization.The earliest remains of domesticated pigs have been excavated at Çayönü in southeast Anatolia dated to 7,000 BC [10].According to most traditional but arguable views based on extensive zooarcheological record [6], the domestic pig originated in the near east and spread west to Europe and east to China.However, recent preliminary research using mitochondrial DNA (mtDNA) sequences from samples of Eurasian wild boars and various breeds of domestic pigs has provided evidence to support a "multiple and independent domestication" hypothesis [2,3].Additional recent mtDNA data from the analysis of 685 individuals including wild boars, feral and domestic pigs across Eurasia also support the hypothesis that the pig domestication occurred independently in the world at diverse geographic locations across Eurasia: three from Far-East (two in China, additional ones in Thailand/Burma and northern India), one from Island South-East Asia (Wallacea), and two from Europe [4].These results also suggest that the S. scrofa as a species originated from islands in South-East Asia (Phillippines, Indonesia), where they dispersed across Eurasia, and with little or no importation of Near East domestic pigs into Europe by early farmers.
Domestication also provides rapid phenotypic evolution through artificial selections.Pig domestication has resulted in highly modified morphological architectures and has caused several major changes in physical types, e.g. one of the earliest results of domestication was a decrease in skeletal size [6].However, it could be argued that size differences in various areas of the world may have arisen from environmental diversity such as feed resources.Improvement after domestication has also resulted in striking changes in yield, biochemical composition, and other traits.Most domesticated animals have experienced a "domestication bottleneck" with reduced genetic diversity relative to their wild ancestor(s).This bottleneck affects all genes in the genome and modifies the distribution of the genetic variation among loci.The magnitude and variance of the reduction in genetic diversity across loci provide insights into the demographic history of domestication.
The pig represents a domesticated animal that has both a convenient number of outgroup species nicely spaced in evolutionary distance, as well as surviving wild conspecifics (see Figure 1).This renders the pig as perhaps one of the most suitable animal species for inferring ancestral mutations as well as determining the fate of derived states and selective processes.Ancestral mutations are important because: (i) the probability that an allele is ancestral is equal to its frequency and (ii) strong positive selection results in regions with reduced heterozygosity and an excess of derived alleles.Since in the case of the pig, it is still unclear as to what constitutes the nearest living relative (likely S. barbatus) and the age of the species S. scrofa relative to some of it's nearest relatives, it is critical to compare S. scrofa with several related species (e.g. S. barbatus, S. celebensis, S. verrucosus, African warthog) that fall within a range of 1 to 6 million years ago (MYA) of inferred evolution [11][12][13][14] (Figure 1).

Natural and artificial selections
Darwin (1859) clearly believed both nature and artificial selection shaped breeds, "The key (to domestic breeding) is man's power to accumulative selection: nature gives successive variations; man adds them up in certain directions useful to him" [15].Human and novel environmental pressures during pig domestication have been principally responsible for the generation of inter-breed genetically variation and for the formation of many unique breeds.Domestic pig diversity has evolved over millions of years through the processes of natural and artificial selections forming and stabilizing each of the species used in food and agriculture.Over the more recent millennia, interactions between environmental and human selection have led to the development of genetically distinct breeds.Artificial selection in a targeted gene is similar to a more severe bottleneck that removes most of the genetic variation from a targeted locus.
Over the centuries, global pig farming in different environmental conditions has resulted in breeds with traits such as heat/cold tolerance and disease resistance, which favor their survival under environmental stresses.Farmers have also been breeding for a variety of attributes with a major focus on productivity traits such as meat yields and fertility.To date, there are likely over 730 pig breeds or lines worldwide of which two thirds reside in China and Europe and over 270 are considered as endangered or critical (Table 1 and Figure 2) [8].Currently, 58 pig breeds are recorded as "transboundary" (occuring in more than one country) including 25 regional transboundary breeds and 33 international transboundary breeds.The worldwide distribution of pigs is dominated by five international transboundary pig breeds from the United States (US) or Europe i.e.Large white (117 countries), Duroc (93 countries), Landrace (91 countries), Hampshire (54 countries) and Pietrain (35 countries) [16].Pig breeds vary greatly in size, color, body shape, ear carriage, behavior, prolificacy, and other traits.In order to meet future challenges in the agricultural and food industries, special efforts are required to conserve genetic resources.Therefore, phylogenetic studies aimed to evaluate the genetic uniqueness and pig breed diversity will assist in developing a rational plan for breed conservation programs.A set of criteria in an attempt to choose specifically breeds for conservation has been suggested including two essential criteria.These include the degree of endangerment and the genetic uniqueness of the breed [17].In addition, the origin and history of domestic pigs can also be explained by phylogenetic analysis.Independent domestication has occurred from wild boar subspecies in Eurasia, and through the introgression of Asian germplasm into European domestic breeds that occurred during the 18th and early 19th centuries [9,18].

Selective sweep detection
When selective pressure is applied to individuals, it ultimately leads to the changes in the underlying genetic content of the population [19].Individuals that carry a more favorable genotype would outcompete their peers, resulting in the fixation of beneficial alleles in the population with concomitant removal of inferior alleles.Two primary approaches have been utilized to identify and study genes or gene pathways.First is a conventional candidate gene approach which represents a gene selection based on comparative mapping and gene function.The second approach is whole genome scans to identify genomic regions under selection through association mapping, i.e. associating phenotypes with genotypes.A third approach involves identification of genomic patterns due to selective sweeps whereby large-scale high density single nucleotide polymorphism (SNP) haplomap on a specific region from diverse populations along with wild ancestral outgroup species or a panel of genes that might be associated with traits.The identification of the causative mutation for the insulin-like growth factor 2 (IGF2) QTL in pigs is an excellent application using these combined approaches [20].Furthermore, by using comparative genomic data sets from different breeds containing wild ancestral species, several interesting genotype-phenotype relationships in domestic animals have been recently illustrated [21][22][23][24][25][26][27][28].
A selective sweep results in the elimination of surrounding variation in regions linked to a recently fixed beneficial mutation.For instance, the muscle-favoring mutation in the porcine IGF2 gene (in-tron3-3072G/A) has swept through commercial pig populations, but is not present in the tested Asian or European wild boars [20].More recently, a naturally occurring G to A transition in the 3' untranslated region of the myostatin gene creates a target site for mir1 and mir206 microRNAs (miRNAs) affecting muscularity in sheep, and a selective sweep has been detected in the hypermusculed Texel sheep [28].The identification of selective sweeps is interesting, not only because it elucidates important evolutionary questions, but also because of the increasing evidence linking selec-tion and disease genes [29,30].The beneficial substitution of an allele shapes patterns of genetic variation at linked sites, and may provide important insights into (i) the mechanisms of evolutionary change; (ii) guide selection of loci for population genetic studies; (iii) facilitate significant genomic regions; and (iv) help elucidate genotype-phenotype correlations in complex traits [31].
Genome scans for detecting signatures of selective sweeps in natural populations have been proposed as a phenotype independent approach to identifying adaptive trait loci even when gene function or phenotype of interest are unknown [32].There are many different methods available for detecting selective sweeps from DNA sequence data [29,[33][34][35][36].
Hitchhiking mapping provides a universal approach for the identification of important mutations and selective sweeps.Hitchhiking is a phenomenon known as neutral variants linked to the beneficial mutation are also affected by a selective sweep [37].This approach has been very successful for identification of selective sweeps at several genes [38,39].More information about genes causing the sweep can be obtained if divergent populations are compared, particularly if the populations have been exposed to well-known selection regimes.Similar comparisons could be performed for hitherto uncharacterized, commercially important traits, such as fat content in pigs.The most ambitious goal of hitchhiking mapping is the identification of quantitative trait nucleotides (QTNs) that confers the selective advantage [32].

Integrated global pig biodiversity
Comparative genomic analysis of different domestic breeds can prove an efficient way of exploiting the genetic basis of phenotypic variation [40].Phylogenetic studies can reconstruct the correct genealogical ties between species and estimate the time of divergence between two organisms since they last shared a common ancestor.
To help understand the animal evolutionary history and genetic diversity, a variety of genetic markers can be utilized.Genetic markers can generally be grouped into two types based on their association with functionality: type I markers are DNA segments encoding for expressed DNA sequences which possess a relatively low degree of polymorphism but high evolutionary conservation, whereas type II markers usually have no identifiable biological function but they are highly polymorphic and not well conserved between species.The comparison of the characteristics of main classes of genetic markers is shown in Table 2 [41][42][43].As one of the most widely used marker types, microsatellites (also called simple sequence repeats, SSRs), are characterized as having a short motif, generally from 1 to 6 bp, are commonly regarded as "junk DNA"; however, SSRs have served as one of the most important markers for genome mapping as well as phylogenic studies.SSRs have been more recently proposed to modify genes with which they are associated.The influence of SSRs on gene regulation, transcription and protein function typically depends on the number of repeats, while mutations that add or subtract repeat units are both frequent and reversible.Over the past decade, it have been demonstrated that SSR variation has been tapped by natural and artificial selection to affect almost every aspect of gene function [44].In addition, mtDNA is a widely used molecular tool in domestication studies, but it suffers from the limitations of poor information for the whole genome and the loss of male-mediated gene flows by its maternal inheritance patterns.
To date, a number of molecular markers have been used for genetic diversity and phylogenetic analysis in pigs including SSRs [45][46][47][48][49], AFLPs [50,51], SNPs [52,53] and mtDNA genotyping [2][3][4][54][55][56][57][58][59][60][61].SSR markers have been largely used in phylogenetic studies and to measure differences within breeds, however due to their neutral properties, they are poorly correlated with phenotypic changes due to selection.Very recently the use of gene markers has attracted more researchers as variation in these allele frequencies may provide information related to functional differences between breeds.Phylogenetic studies using gene markers or SNPs associated with traits of interest are relevant for breed conservation and potential breeds efficiently for the future production markets.Moreover, mtDNA maternally inherited is useful for tracing the maternal lineages in populations.Alternatively, variable sequences on the Y chromosome are useful to measure breed history and phylogenetic origins, although it is much less variable within species than most other genomic sequences [62].The largest ongoing project on biodiversity studies of pig breeds is the European Union (EU) pig biodiversity project II (Pig-BioDiv II), which will evaluate and compare genetic diversity among at least 100 pig breeds originated from China and Europe [49-51, 53, 60, 61].The project not only determines the relationships between breeds by estimating genetic distances, based on SSR markers and haplotypic relationships from mtDNA and Y chromosome polymorphisms, but also determines functional differences among breeds by characterizing trait gene loci and QTL regions.

Pig genome mapping and sequencing
Over the past years, our understanding of the pig genome has rapidly evolved from the localization of genes on specific chromosomes to high density marker maps, and now the pig whole genome is being completely sequenced which represents a key milestone to exploit the pig genome evolution and decipher the molecular basis of various phenotypic traits.

Genome positioning system (GPS)
The availability of large-insert libraries [63][64][65][66][67][68] allows for a more targeted approach to physical and comparative mapping.Over 620K BAC end-sequences (BES) with an average read length of 635 bp have provided a previously untapped source of both coding and noncoding porcine sequence information [69].The first high-resolution, physically anchored, contiguous whole genome radiation hybrid (RH) comparative maps of the porcine autosomes were constructed by using physically anchored sequences derived from BACs [70].Furthermore, a physical map of the pig genome by integrating 265K restriction fingerprints and BES generated from 4 BAC libraries with RH markers, and contig alignments to the human genome was recently constructed with coverage across the 18 pig autosomes and the X chromosome in 176 contigs with an average length of 15 Mb as well as localised representation of the gene rich regions on Y.The map represents an entry point for rapid electronic positional cloning of genes and fine mapping of QTLs, and also provides a platform for the selection of an efficient minimum tiling path (MTP) through the genome to support clone-based sequencing and targeted functional genomics studies (http://www.sanger.ac.uk/Projects/S_scrofa/WebFPC/porcine/large.shtml).Exploitation of this resource as well as the complete human sequence and bioinformatics tools permit the establishment of an ordered list of unique sequences from which to select evenly spaced markers prior to mapping [69].
With the development of molecular markers, porcine genomic maps have been largely enriched in the last few years.The pig genome database has entries for over 4,000 loci including more than 1,588 genes and 2,493 markers (http://www.animalgenome.org/pig/).However, while the average distance between markers is about 2 -3 cM, some large gaps still exist in the pig genetic linkage map (http://www.marc.usda.gov/genome).The physical map for pigs as for other farm animals lagged behind initially.With the use of a somatic cell hybrid panel [71] and a 7,000 rad (IMpRH) or recently of a 12,000 rad (IMNpRH2) RH panel [72][73][74], the physical map has been growing rapidly and contains now over 10,000 genes and markers [75].The publicly available information related to pig genomics and proteomics is shown in Table 3.

The pig genome project
The pig whole genome is currently being sequenced by The Wellcome Trust Sanger Institute through funding provided by Cooperative State Research, Education and Extension Service at the United States Department of Agriculture (CSREES-USDA) (target of 3X genome coverage sequencing by January 2008) [76].This project uses a clone-by-clone sequencing strategy, based on the MTP of BAC clones.The planned order of contig selection for sequencing is: (i) SSC7, SSC14 and SSC4 are highest priority since additional EU funding targeting these chromosomes started earlier; (ii) SSCX, since it will be more challenging to complete and require increased depth sequencing; and (iii) SSC1, SSC11, SSC17, SSC5, SSC6, SSC2, SSC3, SSC8, SSC9, SSC10, SSC12, SSC13, SSC15, SSC16, and SSC18.To date, a total of 7,321 CHORI-242 clones have been selected and used to generate initial shotgun sequencing data (> 52% of the swine genome) (Table 4).Since the CHORI-242 represents a female Duroc pig, 495 additional BACs with at least one BES anchored on chromosome X or Y from the French National Institute for Agricultural Research (INRA) BAC library was selected for sequencing the chromosome Y.A total of 1,660 accessioned clones have generated > 287 Mb of sequence.A pre-finishing strategy is being employed for gap closure and ambiguity resolution.Automated annotation will be used after the entire chromosome has been sequenced (http://www.piggenome.org/).
To take advantage of the emerging genome sequence and the characterization of new QTLs, there is an increasing need for improving the process of SNP discovery to define haploblocks in unique germplasms.Thus, a discovery platform that exploits ancestral chromosomes for unique SNP discovery would expedite SNP discovery for exploitation in breeding.Also there is a need for a united, global initiative that captures and utilizes the broadest porcine germplasms.Porcine SNP discovery is ongoing and several large projects have been completed (Sino-Danish) or are currently being initiated by INRA-Genescope in conjunction with SGSC pig genome sequencing project [76].Within the Sino-Danish initiative [77], 3.84 million sequences have been generated using 5 different breeds (Duroc, Erhuanlian, Hampshire, Landrace and Yorkshire) and within the Genescope initiative, 1 million sequences are being generated from 7 different breeds (Iberian, Landrace, Meishan, Minipig, Pietrain, Wild boar and Yorkshire) [77,78].However, the discovery of SNPs using a limited pool of independent germplasm limits the potential to identify QTLs using genome-wide SNP sweeps and the ability to identify traits highly difficult to phenotype (reproduction, disease resistance) or marker-associated introgression of traits from wild-type alleles into commercial breeding populations.This supports the need for an alternative strategy to generate informative SNPs for use in commercial populations.In addition, the EU PigBioDiv II has provided significant insights into the multiple origins of the pig and phenotypic variation associated with geography, breeding and husbandry practices.Using 1,536 SNPs, distributed across the genome for genotyping 672 DNA samples, it has been demonstrated that the utility of SNPs is being able to define haploblock structure and extending linkage disequilibrium (LD) into genomic regions where genes controlling agricultural traits have been selected [53].

Approaches to understanding genome evolution
The relationship between genome size and organismal complexity remains unanswered.The C-value (genome size) paradox is that genome size does not correlate closely with organismal complexity [79].However, the genomes of more complex organisms are, on average, larger than the genomes of less complex.The C-value of the domestic pig varies from 2.81-3.51measured using various cell types and by different methods [80][81][82].The pig genome comprises 18 autosomes and X/Y sex chromosomes with a size of 2.7 gigabases (Gb) estimated by integration of BES and fingerprints [69,76].Comparative genomic analysis indicates that organismal complexity arises from pro-gressively more elaborate regulation of gene expression, and physiological/ behavioral complexity correlates with the likely number of gene expression patterns exhibited during an animal's life cycle [83].The unexpectedly high frequency of alternative splicing (AS) events has been proposed to be an attractive mechanism for increasing gene expression patterns and consequently for the organismal complexity in eukaryotes [84,85].As one of the most exciting recent discoveries in the field of genomics, the ultraconserved regions that are not functionally transcribed in mammalian genomes, has been suggested to play important role as transcriptional regulatory elements, and account for the complexity of gene regulation [86][87][88][89].This is particularly evident for some genes involved in embryonic development.Another mechanism for increasing organismal complexity was suggested to be DNA arrangement where genes themselves are rearranged during cellular differentiation [90].

Comparative cytogenetics and genomics
Genome organization has traditionally been inferred using two approaches: cytogenetics mapping and genetic-linkage or physical mapping [91].Comparisons of G-banded chromosome patterns were first used to infer homologies of whole chromosomes or subregions between species and even across mammalian orders.Gene mapping utilizing somatic cell hybrids subsequently confirmed the large tracts of mammalian genomes were remarkably conserved, suggesting that transferring information from species such as human and mouse, which have gene-rich maps, to the gene-poor developing maps of domestic animals is feasible [92].Chromosome painting [or Zoo-fluorescence in situ hybridization (Zoo-FISH)] permits rapidly detecting entire chromosomal homologies across mammalian orders.Genetic linkage map are best suited to ordering polymorphic SSR markers, but less efficient for developing comparative maps since the limited degree of coding locus (type I markers) polymorphism observed within most interspecies crosses.Radiation hybrid (RH) mapping has proven to be an effective approach for the rapid ordering of evolutionarily conserved type I coding gene markers over the whole genome of various species [70,74,92,93].Genome sequence based comparative mapping is becoming a powerful approach to reveal the molecular basis for phenotypic variation as well as the evolutionary forces that have contributed to speciation, including underlying mutational processes and selective constraints [94][95][96].In addition to comparative genome mapping, with the integration of genomics and phylogenetics, phylogenomic studies are progressing to resolve long-standing evolutionary/phylogenetic controversies, to refine dogma on how chromosomes evolve, and to guide annotation of human and other mammalian genomes [97].

Exploiting varieties of genomic architectures
Genome rearrangements: In eukaryotes, genome rearrangements, such as inversion, translocations and duplications, are common and range from gene segments to hundreds of genes.In most eukaryotes, there is a strong association between rearrangement breakpoints and repeat sequences.Rearrangement polymorphisms in eukaryotes are correlated with phenotypic differences, and proposed to confer varying fitness in different environments.There is little evidence that chromosomal rearrangements causes speciation, but probably intensify reproductive isolation between species that have formed by other routes [98].A relatively large number of chromosomal abnormalities including inversion, translocation, duplication, fission and fusion have been identified in pig [93,99,100].The chromosomal abnormalities are often responsible for a considerable decrease in prolificacy of the carrier animals.Recently, a bioinformatics tool was created to permit multi-species comparisons between the genomes of humans, horses, cats, dogs, pigs, cattle, rats, and mice (http://evolutionhighway.ncsa.uiuc.edu/).This provides a useful resource for evaluating pig evolution.A large set of reuse breakpoints were discovered and more than 20% of the discovered breakpoints have been reused during mammalian evolution.The eight species comparison showed that the historical rate of chromosome evolution in mammals was different than previously thought.The study demonstrated that evolutionary changes has been moving faster during the last 65 million years than for the prior 35 or so million years [92].
Transposable elements: Evolutionary biologists hypothesized that the earliest life originated via a system based on a self-replicating RNA genome and RNA catalysts [101].The advent of polymerases that make DNA copies of RNA templates allowed the conversion of information from unstable ribose-based polymers to more stable deoxyribose-based polymers through the process of reverse transcription.It is now known that only approximately 1-2% of the human genome is comprised of exonic sequences.The remainder, so-called "junk DNA", is composed largely of introns, simple repeat sequences and transposable elements or their remnants.In mammals, transposable elements account for nearly 50% of the genome [102,103].Transposable elements were historically dismissed as junk or selfish sequences parasitizing the genome of living organisms [104,105].This view has been challenged through a wave of new information demonstrating their emergence as contributors to the evolution and function of genes and genomes, and have a tremendous impact on an organism's phenotype [106][107][108].These effects include drug response, disease susceptibility and evolution novelties between species.The most common genomic effect of transposable elements is the induction of mutation.Through their mobility and ability to recombine, transposable elements can generate various types of rearrangements and lead to insertions, deletions, duplications and inversions.In mammals, retrotransposon have been proposed to act as general modulators of gene expression and to play a role in X-chromosome inactivation [109,110].Transposable elements, first recognized as potential causal agents of human disease in 1988 [111], have evolved over millions of years and have achieved a balance between detrimental effects on the individual and long-term beneficial effects on a species through genome modification.It has been suggested that transposable elements play an important role through diverse ways in the event of shaping the genome to speciation [107].
Single nucleotide mutations: SNPs are abundant and widespread throughout the pig genome (coding and non-coding regions), and are rapidly becoming the marker of choice for many applications in population genomics, evolutionary analysis, conservation genetics, because of the potential for higher genotyping efficiency, data quality, genome coverage and cost-effective high throughput genotyping techniques.In most species, SNPs occur typically on average every 200-500 bp [43,[112][113][114].About 90% of genetic variation has been ascribed to SNP allelic variants that occur at a frequency of > 1%.Within coding regions (~1-2%), nonsynonymous SNPs can be considered candidates for functional changes.The phenotypic effect of any particular SNP is rarely known and often can only be inferred based on the evolutionary dynamics of the variant or on its effect on protein function.The nonsynonymous (dN) : synonymous (d S ) SNPs ratio (d N /d S also known as Ka/Ks) can then be taken as a measure of the strength of purifying selection on a gene or the entire genome.Even synonymous SNPs in protein-encoding genes can have functional implications.Although multiple codons can encode the same amino acid, some occur more frequently in the genome than is predicted by random (i.e.codon usage bias).Therefore, a SNP that causes a change from a more common or preferred codon to a rare or unpreferred codon can affect the efficiency of protein synthesis and expression.Most SNPs occurs in the non-coding portion of the genome, but can nevertheless be evaluated with regard to function.For example, the IGF2-intron3-G3072A substitution causes a major QTL effect on muscle growth in the pig [20], and explains a major imprinted QTL effect on backfat thickness in a Meishan × European white pig intercross [115,116].
A substantial fraction of the non-coding genome is conserved between species, suggesting that purifying selection acts on a large portion of the genome.Thus, SNPs can be evaluated based on their location in conserved versus non-conserved non-coding regions.Moreover, the regulatory regions of genes (e.g.promoters, enhances, silencers, insulators, miRNA binding sites) have been annotated using comparative and predictive algorithms, and thereby enabling the assessment of non-coding regulatory SNPs.For instance, SNPs that occur in the transcription factor binding sites of a promoter are more likely to affect function than SNPs that occur outside the regulatory region of a gene [28,117].Although ascertainment bias can be a problem with some applications, SNPs can generate equivalent statistical power whilst providing broader genome coverage and higher quality data than can either SSRs or mtDNA, suggesting that SNPs could become an efficient and cost-effective genetic tool.

Alternative splicing (AS) events and evolutionary impacts
Alternative splicing (AS), one of the most important and nearly ubiquitous mechanisms regulating gene expression in many organisms, occurs in the coding sequence, coordinates physiologically meaningful changes in protein structure and function and is a key mechanism to generate the complex proteome of multicellular organisms.AS results in two ways: (i) through skipping exons that encode a certain protein feature; and (ii) by introducing a frameshift that changes the downstream protein sequences.Recently, novel types of AS events have been proposed that either join two non-consecutive exons (creating a protein feature) or insert an exon into the protein body (destroying a feature) [118].The effects of AS range from a complete loss of function or acquisition of a new function to very subtle modulations, which are observed in the majority of cases reported such as binding properties, enzymatic activity, intracellular localization, protein stability, phosphorylation and glycosylation patterns [119].
It has been estimated that 30-70% of mammalian genes are alternatively spliced [120][121][122], and that mammalian AS events frequently arise from the evolutionarily rapid loss or gain of exons from genomes [121,[123][124][125]. Variant splice patterns are often specific to different stages of development, particular tissues or a disease state [126].Utilizing a highly predictive computational method over 11% of human and mouse alternative exons were estimated to represent species-specific AS events [127].By comparing gene structure of orthologous genes in human and mouse genomes, it has been revealed that the majority (98%) of human constitutive and major forms of alternative exons are conserved in the genomic sequences of their mouse and rat orthologues [121].By contrast, nearly 75% of the minor forms of alternative exons are not conserved, suggesting that AS is associated with a significant increase in the rate of exon creation and deletion in mammals, and plays a role on speciation events.
Splicing mutations have long been proposed to be the basis for a number of human diseases [128].More recently, based on the disease-gene propensity of human genes in terms of their coding region length and intron number, it was estimated that ~60% of human disease mutations represent splicing mutations, the most frequent cause of hereditary diseases [129].Although the importance of AS in various biological processes such as sex determination [130] and apoptosis has been known for a long time, genomics and in particular the shotgun sequencing expressed sequence tags (ESTs), have revealed its nearly ubiquitous role in gene regulation [85].Genome sequencing has made it possible to study the evolutionary impact and constraints of AS [131].

Exploring functional portion of the genome
Recently, it was estimated that according to sequence conservation patterns, the actual functional portion of the mammalian genome is at least 5% [103].
In mammals, using comparative evolutionary approaches it appears that functional elements are clustered mostly within ~2 kb surrounding protein-coding sequence [132,133].These observations help to paint a general picture of noncoding conservation and structure in the genome and are likely to be extremely helpful in focusing future detailed investigation.Given that the protein-coding fraction is approximately 1.5%, there is significant opportunity for identification of additional functional elements.Sequence conservation does not reveal the total fraction of the functional genome, but simply the fraction of the genome that has remained functional within the group of species compared.An additional fraction that is not conserved across larger evolutionary distances such as across all vertebrate lineages represent species-specific or lineage-specific genes.The best known functional fraction is the class of protein-coding genes.Regulatory elements and noncoding RNAs such as small interfering RNAs, (siRNAs) and miRNAs are considered two other significant functional classes of the mammalian genomes.Analysis of the human and mouse genomes has identified an abundance of conserved non-genic sequences (CNGs).The significance and evolutionary depth of their conservation remain unknown.A striking extremely high number of such elements is found in vertebrate gene deserts, defined as long regions (> 500 kb) containing no protein-coding sequences and without obvious biological functions [87][88][89].It has been suggested that a global role of CNGs in genome function and regulation, through long-distance cis or trans chromosomal interactions [134].

Future expectations of facilitating pig genome navigation
Exploring the complete functional information encoded in a genome is a major challenge in biological research.Comparative genome analysis between the pig and related mammals could provide a powerful and general approach to identifying functional elements without previous knowledge of function and detect phylogenetic footprinting of pig genome evolution.A principal goal of genetic research is to identify specific genotypes that are associated with phenotypes and to conduct genome-wide genotyping on a massive scale.The advent of the complete genome sequencing along with gene prediction has resulted in the development of technologies that allow the assignment of genes to particular biological modules.Integration of 'omic' technologies including genomics, transcriptomics, proteomics and metabolomics will link genomics and system biology and accelerate the acquisition of fundamental knowledge about biology systems.The outputs of 'omics' research will change our approach to solving biological problems and result in novel uses of biotechnology to develop and improve products for agriculture.Advances in genome-phenome research will contribute to agriculture and food, bioengineering, biomedicine and health, conservation and the environment.Genome to phenome research for the pig is still at a very early stage, and requires enormous amount of work to understand the genetics and development of shape, specialization and organization at levels from cells to the whole individual.
Since the whole genome sequence of the pig will soon be available, comparative studies with the completed human genome, and other mammalian genomes having moderate to deep genome coverage (i.e.cow, horse, dog, mouse, rat and chimpanzee) will yield new information about the pig genome evolution.In the next decade, by utilizing approaches of comparative genomics, it will be possible to effectively select animals for agricultural purposes, create appropriate biodiversity conservation programs and create pig models for medical research.The utility of the pig in biomedical research affords many advantages compared with other animals such as mouse and rat i.e. (i) its similar size to humans (ii) sharing high similarities with human both anatomically and physiologically; and (iii) the ability to target gene manipulation and clone using nuclear transfer.

Figure 2 .
Figure 2. Global status of pig breeds.Source: The state of the world's animal genetic resources for food and agriculture (1 st ), 2006[8]