Localization of 5,105 Hanwoo (Korean Cattle) BAC Clones on Bovine Chromosomes by the Analysis of BAC End Sequences (BESs) Involving 21,024 Clones

: As an initial step toward a better understanding of the genome structure of Korean cattle (Hanwoo breed) and initiation of the framework for genomic research in this bovine, the bacterial artificial chromosome (BAC) end sequencing of 21,024 clones was recently completed. Among these clones, BAC End Sequences (BESs) of 20,158 clones with high quality sequences (Phred score (cid:116) 20, average BES equaled 620 bp and totaled 23,585,814 bp), after editing sequencing results by eliminating vector sequences, were used initially to compare sequence homology with the known bovine chromosomal DNA sequence by using BLASTN analysis. Blast analysis of the BESs against the NCBI Genome database for Bos taurus (Build 2.1) indicated that the BESs from 13,201 clones matched bovine contig sequences with significant blast hits (E<e -40 ), including 7,075 single-end hits and 6,126 paired-end hits. Finally, a total of 5,105 clones of the Korean cattle BAC clones with paired-end hits, including 4,053 clones from the primary analysis and 1,052 clones from the secondary analysis, were mapped to the bovine chromosome with very high accuracy. ( Key Words Korean Cattle Sequence Chromosomal Map) (cid:71)


INTRODUCTION
Developing the genetic map for cattle is an expected initial step toward identifying the Economic Trait Loci (ETL). Bovine linkage map construction that is based upon the classical genetic approach has provided a foundation for generating the physical map (Barendse et al., 1997;Ihara et al., 2004). The construction of a bovine whole-genome radiation hybridization (RH) panel (Womack et al., 1997;Itoh et al., 2005) allowed cattle genome researchers to construct subsequent RH panel-based physical maps (Band et al., 2000;Larkin et al., 2003;Everts-van der Wind et al., 2004). Further incorporation of the bovine bacterial artificial chromosome (BAC) clones into the RH map (Larkin et al., 2003;Everts-van der Wind et al., 2005) has prompted the bovine whole genome sequencing project currently being conducted by an international consortium of bovine genome researchers.
The BAC clones have been mostly used for physical mapping due to their capacity to hold large DNA insert in a stable manner (Kim et al., 1996). The BAC library has made it possible to screen multiple clones of interest by using the PCR approach (Yu et al., 2005;Choi et al., 2006). The bovine BAC library (Chae et al., 2007) generated by using the chromosomal DNA of Korean cattle (Hanwoo breed) has been a useful resource for genomic research (Hong et al., 2005). Random analysis of the BAC end sequences (BESs) using universal primers is one of the approaches to obtain chromosomal DNA sequences of the cattle quickly and at relatively low cost. In addition, the BESs can be utilized to localize the precise position of the BAC clones on the bovine chromosomes (Larkin et al., 2003) and to identify a variety of DNA markers (Heaton et al., 2002;Kim et al., 2004;Sasazaki et al., 2006). Here, we report on the mapping of the Korean cattle BAC clones in comparison with currently generated Bos taurus contig map.

BAC end sequencing
Single BAC colonies were transferred into 1 ml of Terrific Broth (TB) medium supplemented with antibiotic (chloramphenicol at 12.5 g/ml) using a 96-deep well plate and incubated at 37 C overnight with gentle rotation (550 rpm; HT-MegaGrow shaking incubator, Bioneer, Daejeon, Korea). The BAC DNA was extracted and purified using a Montage BAC 96 miniprep kit (Millipore, Bedford, MA, USA), or modification of an alkaline lysis method commonly used for isolating plasmid DNA from bacteria (Birnboim et al., 1979;Kelley et al., 1999) designed to optimize DNA yield. BAC end sequencing was performed according to the manufacturer's instructions with a BigDye Terminator Cycle Sequencing Kit (ver. 3.1, Applied Biosystems, Foster City, CA, USA). Cycle sequencing reaction was performed with 600 ng of BAC DNA, 3 pmoles of primer, 0.87 l of 5 buffer, 1.38 l of distilled water and 0.5 l of BigDye, using a GeneAmp PCR System 9700 (Applied Biosystems). The universal sequences of primers used for the forward primer was T7 and the reverse primer were slightly modified RP2 or M13R. Amplification was conducted at 96 C for 2 min and 80 cycles of denaturation at 96 C for 15 sec, annealing at 50 C for 10 sec, and extension at 60 C for 2.5 min. PCR products were purified by ethanol precipitation and resolved on an ABI 3730XL DNA Analyzer (Applied Biosystems).

Sequence analysis
Sequence chromatograms were processed automatically using sequencing analysis software (Sequencing analysis; ver. 3.7, Applied Biosystems). The trace files were trimmed with trim-alt 0.05 (P-score 20) using Phred program  and the sequence below 100 bp was removed. In addition, vector trimming was conducted using cross-match software (http://www.phrap.org). Repetitive elements were identified with the RepeatMasker program (Jurka et al., 2005) to determine the relative content of interspersed repeats, small RNAs, satellites, simple repeats and low complexity sequences in the nucleotide sequences. Finally cleaned multi-fasta formatted data with the RepeatMasker program and Repbase database (ver 2006.2) were analyzed with NCBI local BLAST (Altschul et al., 1990) and sequences were searched against the NCBI Bos taurus genome data (build 2.1 based on Btau 2.0).

Comparative mapping
On the basis of the BLAST results, the only paired-BAC clones with paired-end hits at the same chromosome and less than 300 kb in their estimated insert size were selected for mapping using perl script, GD-library and tgif generation process using post-scripts map make out.

RESULTS AND DISCUSSION
Among the over 150,000 BAC clones generated from the chromosomal DNA of a Korean cattle, the sequencing analysis of BAC end sequences (BESs) for 21,024 clones, including those for 4,296 clones previously described (Chae et al., 2007), has been attempted (Table 1). Several thousand BAC clones (20,158) were selected based on quality (phred score 20) and length ( 100 base pairs) after base calling and vector trimming. The average and total lengths of the selected sequences were 620 bp and 23,585,814 bp, respectively. A total of 17,903 clones with paired BESs and 2,255 clones with a single BES corresponded to the selection criteria. Approximately 1.54% of the clones revealed a length of BES that was not greater than 100 bp at both ends after vector trimming, which may be due to failed insertion in the clone or a sequencing failure. Thus, on the basis of these results, a complementary estimation can be made that is, at least 98.46% BAC clones have a proper insert sequence.
High quality BESs of the Hanwoo were compared by BLASTN analysis with the B. taurus genomic sequence obtained from a public database (NCBI build 2.1) after repeat masking by RepeatMasker (ver. 3-1-3) software (http://www.repeatmasker.org) with the Repbase database (January 20, 2006). While 6,957 of the 20,158 clones used in the analysis lacked similarity in their BESs to the Bos taurus (Btau 2.1) genome sequence (no hit; E<e -40 ), the remaining 13,201 clones showed a significant similarity (significant blast hit; E<e -40 ) ( Table 2). Out of the 13,201 clones with a significant similarity, the end sequences of 7,075 clones showed a single-end hit (only one of both ends of the BESs hits the bovine genome), and 6,126 clones showed paired-end hits (both ends of the BESs hit the bovine genome). There were 4,767 clones of the 7,075 total that produced single hits (hits at only one site of the bovine genome); 2,308 clones produced multiple hits (hits at multiple sites of the bovine genome). Sequence similarity was observed in both end sequences of 6,126 clones. Among these, 1,226 clones revealed each end sequence displayed similarity in a different chromosomal location, but in 4,900 clones, both end sequences showed similarity to only one chromosome. Among these clones with similarity to one chromosome, each end of the 2,447 clones had single hits in the assembled genome and one or both ends of the 2,453 clones had multiple hits. The location of each clone in a chromosome was determined from the result of BLAST analysis between Hanwoo BES and the B. taurus whole genome sequence. To determine the chromosomal location of the BAC clones, the only clones yielding paired-end hits were selected (E<e -40 ). The 4,900 clones with similarity to the B. taurus genome in both end sequences (Table 2) were analyzed initially. Of the clones that displayed more than two hits (2,453 clones) in the same chromosome, ones with high E-values in both end sequences were selected for inclusion in the construction of the genome map. The average insert size of Hanwoo chromosomal DNA in BAC clones was 140 kb (Chae et al., 2007). Therefore, clones containing more than 300 kb between the end sequences were excluded from the 4,900 clones with similarities to the chromosome in both end sequences, resulting in 4,053 clones being used for the primary construction of the map (Figure 1).
The clones excluded from the primary genome map, the 6,957 clones that did not display similarity with the B. taurus genomic sequences (E<e -40 ) and the 7,075 clones with single-end hits after deletion of repetitive sequences, were used for a secondary comparison against the B. taurus genomic sequences with reduced stringency (E<e -10 ). The secondary comparison analyses were performed twice as the first analysis -once with repeat masking and once without. 1,301 clones including repeat sequences and 181 clones without repeat sequences were additionally selected from the second analysis. The removal of clones with end sequences larger than 300 kb resulted in 1,052 clones to be used for additional map construction. The chromosomal location of 5,105 clones in total, therefore, including 4,053 clones from the primary analysis and 1,052 clones from the secondary analysis were revealed (Figure 1).
In this report, we present the physical genome map of the Hanwoo BAC clones based on alignments of selected Hanwoo BES data against the B. taurus genomic sequence that was obtained from a public database (NCBI build 2.1) as has been described. Among the 21,024 BAC clones initially used for analysis, a total of 5,105 clones with paired-end hits were mapped on bovine chromosomes. While the highest numbers of BAC clones (316 clones) were mapped on bovine chromosome 1, the lowest numbers of the BAC clones (56 clones) were mapped on chromosome X. This result indicates that the number of clones mapped on each bovine chromosome is in proportion to the physical sizes of each bovine chromosome (Larkin et al., 2003). Only 265 of the 629 clones with paired-end hits from the Hanwoo secondary radiation hybrid map assembled by the COMPASS program (Everts-van der Wind et al., 2004;Chae et al., 2007) were selected to be included in the map construct, implying that our bovine maps are still in the process of being updated. The calculated average insert size of 132.8 kb in the BAC clones used for the map approximates the previously estimated average size of Hanwoo BAC clones.
The completed sequences for the B. taurus chromosome remain to be revealed. The current Bos taurus genome construct (2.1) is the assembly produced by the HGSC (human genome sequencing center) at the Baylor College of Medicine. The genome was derived from a male of the Hereford breed of cattle. The sequencing strategy produced a 6.2-fold whole-genome shotgun (WGS) assembly wherein the sequence consists of 4,407 contigs and 10,000N (bp) included between each contig. Moreover, there are 97,891 gaps included in the 4,407 contigs. Recent development of a more precise linkage map with more microsatellite DNA The complete map will be useful in examining the unique sequences to be used for possible breed differentiation and to enhance the Whole Genome Shotgun data compilation. The clones yielding no hits in BLAST search, even those with high E-values, may be due to sequences unique to the Hanwoo, but the more likely possibility is that B. taurus genome data (B tau 2.1) in the NCBI is incomplete. On the other hand, clones with numerous hits suggest improvements are needed in the current database on repeat sequences. The event where clones were mapped in different chromosomes is considered evidence that the assemblage of Whole Genome Shotgun data was imperfect. Of the 5,105 clones used in map construction, 1,907 clones did not overlap while 3,198 clones did. The total length after removing the overlaps was 510,627,052 bp. Map coverage, therefore, is approximately 29.33% since the length of the current B. taurus 2.1 genome is estimated at 1,741,209,718 bp. The map coverage can be expected to reach 214% once the additional 130,000 clones are analyzed to produce a more accurate genome map. Therefore, it is our contention that the bovine map presented in this report will provide a useful framework for further Hanwoo genome research and additional information in the study of divergence in cattle breeds.