Identifi cation of possible genetic alterations in the breast cancer cell line MCF-7 using high-density SNP genotyping microarray

Context: Cancer cell lines are used extensively in various research. Knowledge of genetic alterations in these lines is important for understanding mechanisms underlying their biology. However, since paired normal tissues are usually unavailable for comparison, precisely determining genetic alterations in cancer cell lines is diffi cult. To address this issue, a highly effi cient and reliable method is developed. Aims: Establishing a highly effi cient and reliable experimental system for genetic profi ling of cell lines. Materials and Methods: A widely used breast cancer cell line, MCF-7, was genetically profi led with 4,396 single nucleotide polymorphisms (SNPs) spanning 11 whole chromosomes and two other small regions using a newly developed high-throughput multiplex genotyping approach. Results: The fractions of homozygous SNPs in MCF-7 (13.3%) were signifi cantly lower than those in the control cell line and in 24 normal human individuals (25.1% and 27.4%, respectively). Homozygous SNPs in MCF-7 were found in clusters. The sizes of these clusters were signifi cantly larger than the expected based on random allelic combination. Fourteen such regions were found on chromosomes 1p, 1q, 2q, 6q, 13, 15q, 16q, 17q and 18p in MCF-7 and two in the small regions. Conclusions: These results are generally concordant with those obtained using different approaches but are better in defi ning their chromosomal positions. The used approach provides a reliable way to detecting possible genetic alterations in cancer cell lines without paired normal tissues.


Introduction
Cancer cell lines have been used extensively in various cancer research. Knowledge of genetic aberrations in these cell lines is important for understanding mechanisms underlying biological behaviors of the cells including their responses to drugs. However, a large portion of genetic alterations such as those caused by loss of heterozygosity (LOH) and chromosomal segment amplification may not be precisely determined without comparing with the genotypes of paired normal tissue, which usually are unavailable. MCF-7 is one of the widely used cancer cell lines. It was established 30 years ago [1] from a pleural effusion taken from a woman with metastatic breast carcinoma, and has been used worldwide for studying various aspects of breast cancer. Many studies on genomic variation of MCF-7 have been reported during the past 30 years. The methods employed in these studies include the conventional karyotyping, [2,3] metaphase [4,5] and interphase [6,7] in situ hybridizations (ISHs), comparative genomic hybridization [CGH], [8,[9][10][11] and endsequence profiling [ESP]. [12] With these methods, a number of diverse genetic abnormalities in the MCF-7 cell line have been revealed, including chromosomal rearrangements, deletions, translocations, inversions, breakpoints and changes in chromosome and chromosomal segment copy numbers (loss or amplification). However, because all above methods, except ESP, are limited in resolution, the regions affected by the aberrations could not be precisely determined. Although ESP can reach high resolution, it involves complicated, laborious and large amounts of molecular cloning and sequencing, which is not suitable for the analysis of a large number of cell lines or tissue samples.
Extensive chromosomal aberrations in the MCF-7 genome were identified with CGH. [8][9][10][13][14][15] However, the conventional CGH requires preparation of chromosomes in metaphase spreads, and its resolution is limited to 10 ~ 20 Mb for deletion and 2 Mb for amplification. [16,17] Recently, microarray-based CGH has emerged for high-resolution and high-throughput analysis. [18][19][20] Depending on the number and sizes of the clones used, microarray-based CGH could be used for genetic analysis at different resolutions. When cDNA clones are used, only the coding sequences can be included, not promoters, introns and intergenic sequences. [17] Since genomic clones contain all kinds of repeats with varying numbers, results from microarray based CGH may be complicated. This has been addressed at least in part recently by using sophisticatedly selected oligonucleotides as probes. However, since CGH is a DNA copy number based assay, it cannot be used to detect genetic aberrations caused by somatic recombination, one of the causes of LOH, which results in no copy number change in the affected regions. Furthermore, in many tumors, cells are very heterogeneous. In order to obtain accurate results, cell populations, which are often microscopic in size, must be isolated by microdissection and the amount of material available for analysis is usually very small. In this case, CGH cannot be used unless unbiased whole-genome amplification is used before CGH analysis.
The advent of high-throughput SNPs genotyping technologies has made it possible to use single nucleotide polymorphism (SNP) markers for detecting LOH and/or amplification of chromosome segments. Bignell et al. [21] and Huang et al. [22] used Affymetrix SNPs genotyping arrays for determining the allelic state of the markers and for detecting DNA copy number changes. Compared with CGH, Affymetrix SNP arrays can be used to detect not only copy number changes but also genetic alterations with no copy number changes such as those caused by mitotic recombination. However, since most cancer cell lines do not have paired normal tissue, the question regarding what regions can be considered as regions affected by either LOH or amplification remain unsolved.
Recently we developed a novel high-throughput SNPs genotyping system. [23][24][25] With this system, users can customize their own SNP panels, more than 1,000 SNPcontaining sequences can be amplified to a detectable amount in a single tube from as few as a single cell. Thousands of the amplified sequences containing SNPs can be resolved by a single microarray followed by genotype determination with high sensitivity, accuracy and simplified procedures. In the present study, this genotyping approach was used for detailed genetic analysis of 11 whole chromosomes and regions on two additional chromosomes in the MCF-7 breast cancer cell line using our customized SNP panel. By analyzing the results with our newly developed statistic approach, a number of chromosomal regions that may have been affected by LOH were revealed and localized to specific regions.

MCF-7 cell line and DNA samples
The MCF-7 cell line was kindly provided by Drs. William N. Hait/Jin-Ming Yang laboratory, which was originally from the laboratory of Dr. Kenneth Cowan of the Eppley Institute for Research in Cancer and Allied Diseases (Omaha, NE). The cell line was maintained in RPMI 1640 media containing 10% FBS, 100 units/ml penicillin, and 100 µg/ml streptomycin at 37°C with 5% CO 2 and 95% humidity. Genomic DNA was extracted with the TRIzol Reagents kit (Invitrogen, Inc.) according to the manufacturer's instructions and quantified by spectrophotometry. DNA of a control cell line, NB00637, which originated from a female Caucasian, and DNAs from 24 unrelated individuals from four ethnic groups, African American, Caucasian, Chinese and American Indian, six each, were purchased from the Coriell Cell Repositories (Camden, NJ)

SNP selection
SNPs were selected from the dbSNP database (http://www. ncbi.nlm. nih.gov/SNP/index.html) maintained by the National Center for Biotechnology Information (NCBI). To use a two-color fluorescence (Cy3 and Cy5) labeling system for genotype determination, only transition SNPs (A/T to G/C changes or vice versa) were selected. A panel of 4,396 SNPs spanning 11 entire chromosomes and two regions on other chromosomes [ Table 1] were incorporated into five multiplex groups, each of which contained 627 to 1172 SNPs. The majority of these SNP were described in our previous publications. [23,24] Primer and probe design Primers and probes were designed using the software developed in our laboratory. [23] For each SNP, a pair of PCR primers was designed to amplify the sequence containing the polymorphism site. Another pair of primer-probes (so named because they can be used as either primers for generating single stranded DNAs (ssDNA) or probes printed on a slide for detecting the polymorphic sequences) was also designed. Primer-probes in each pair had their 3' ends immediately next to the same polymorphic site in the two different DNA strands. Since the primer-probes are internal (nested) with respected to the primers used for PCR products, they can be used to enhance the efficiency and specificity when used as primers. Twelve sets of oligonucleotides were used as positive probes and templates for quality control of hybridization and labeling. Each set consisting of three oligonucleotides, one was used as a probe and the other two as allelic templates differing by a single base. The sequences of those oligonucleotides were generated randomly by a computer program and were checked against the NCBI database to be certain that they didn't share significant sequence identity with any known human sequence in the database.

Microarray preparation
Glass slides used for microarray were prepared according to the procedure described previously. [23] Briefly, pre-cleaned Gold Seal slides (Becton Dickson) were soaked in 30% bleach with shaking for 1 hour followed by rinsing six times with distilled water. The slides were then sonicated in 15% Fisher brand Versa-Clean Liquid Concentrate with heat on for 1 hour followed by rinsing 10 times with distilled water and five times with MilliQ water. Slides were dried by spinning in a microfuge at 1,000 rpm for 1-2 min, and then baked at

Probe labeling and microarray detection
The slide was spun to dry as above. The array was covered by 80 µl of labeling solution containing 10.2 µl of Sequenase buffer (supplied by the Sequenase vendor), 32 units of Sequenase (GE Healthcare, NJ), and 750 nM of Cy3-ddUTP and Cy5-ddCTP under a cover slip. Then, the slide was sealed in the chamber, immediately soaked in a 70 o C water bath for 10 min, and then was washed again under the conditions used after hybridization as described above. After drying, the microarray slide was scanned by a GenePix 4000B microarray scanner (Axon Instruments) at 10 micron per pixel, 100% power and 480 to 560 PMT gain.

Data analysis and genotyping
The microarray images from scanning were digitized with the computer program, GenePix Pro (Axon Instruments). Genotype calls for each SNP were determined from signal intensity data by a computer program developed in our laboratory as described previously. [26] After normalization and background subtraction, genotype calls were made by using the ratio between the two color intensities (Cy5/Cy3) with 0.4 and 2.5 as cutoffs. SNPs with ratios between or equal to the two cutoffs were assigned as heterozygous state and those with ratios outside of the range were assigned to the two respective homozygous states

Genotypes of the cell line MCF-7 and HB00637
Out of the 4,396 SNPs included in the study, 4,172 (94.9%) could be detected from the MCF-7 cell line [ Figure 1]  Expected Size of Homozygous Zones in the Human Population. To further explore the cause of the differences between the fractions of heterozygous SNPs in different cell lines and in the human individuals, we examined the allelic states of the SNPs on individual chromosomes. Chromosomes with the fraction of heterozygous SNPs in MCF-7 significantly smaller than those in the control cell line and in the Caucasian individuals could be affected by LOH or amplification. As shown in Table 1 For most cancer cell lines, it is difficult or impossible to obtain the paired normal tissue from the donors. To identify chromosomal regions possibly affected by loss of heterozygosity, we developed an effective method based on both statistic and genetic considerations. First, we assume that there is no linkage disequilibrium (LD) between the SNPs included in the present study. Genetically, when markers are in LD that is often the case when markers are located closely, certain alleles of the SNPs are associated from generation to generation in the population. Since SNPs used in the present study were 220 to 500 KB apart, and LD is usually not observed between markers separated by such distances, the alleles of the SNPs selected in the present study should be random combined in the human population. When the frequencies of the alleles of a group of SNPs in a given human population is known, the fractions of the homozygous and heterozygous individuals with respect to these SNPs can be estimated in this population. Statistically, the maximal number of the clustering homozygous SNPs is a function of the fraction of homozygous SNPs in the population and the number of SNPs in the regions under consideration, and can be calculated based on binomial distribution. Although the mathematical approach for the calculation was described by Philippou and Muwafi in 1982, [27] since the procedure involves complicated and an excessive amount of computation. To make the computation practically feasible, we developed a very straight forward simulation approach for analyzing the data from the present study.
Since the MCF-7 donor was a Caucasian, the average fraction of homozygous SNPs, which is 72.4% in the Caucasian population, was used for the analysis. Because only SNPs on the same chromosome may form homozygous SNP clusters or Homozygous Zones (HZs), the maximal size of HZs that may be found based on random combination was calculated for each chromosome separately based on the number of SNPs used in the present study. As defined by the simulation method, the maximal number homozygous SNPs in each cluster from random combination is calculated based on a probability of 0.01. Results are listed in Table 2 (see the column "Max Number of SNPs in a Random HZ").
Fourteen HZs were found with size greater than the maximum number of homozygous SNPs expected based on random combination on nine chromosomes, 1, 2, 6, 13-18, 20 and X in the MCF-7 cell line [ Table 2] and none were found in the HB00637 control cell line. The number of SNPs in these HZs ranged from 30 to 359. No heterozygous SNPs were found along the entire length of chromosome 13. Only two heterozygous SNPs were found on the entire chromosome 18. One HZ on chromosome 16 was found across the centromere.
The analyzed regions on chromosomes 21 and X were small (2.5 and 1.8 Mb, respectively) with very high SNP densities (218 and 26.7 SNPs/Mb, respectively). Since closely located markers are often found in LD, the assumption that markers are randomly associated may not be applied to these two regions. Therefore, the HZ in these regions may not be defined by the criteria described above. However, the entire examined region with 47 homozygous SNPs spun 1.8 Mb on chromosome X and a region with 397 affected SNPs covered a major portion (2.24 Mb) of the 2.5-Mb region on chromosome 21. Based on the data reported, the size of the haplotype blocks (genomic regions with markers in LD) in the human genome is estimated to be ~8 kb. [28][29][30][31][32]

DISCUSSION
In most cancer genetic studies, genetic alterations in the tumor cells can be identified by comparing the genotypes of tumor cells with those for paired normal tissue. However, it is difficult or impossible to obtain paired normal tissue for many of the cancer cell lines, especially for those that were established a long time ago. Negrini et al [33] reported that five polymorphic markers (microsatellite) on chromosome 11 were homozygous in MCF-7 cell line and the calculated probability of these five markers all being homozygous was less than 1%. It was concluded that one copy of chromosome 11, or at least the short arm of it, in MCF-7 was lost. In the present study, we used our recently developed statistical approach and much higher marker density, and identified the HZs that may not be caused by random combination in the breast cancer cell line, MCF-7. Fourteen HZs were found in the cell line but none in the control cell line. Another two HZs covered the entire or nearly entire examined regions on chromosome X and 21 in MCF-7 with a high marker density for which the expectation based on random combination of SNP alleles also may not apply.
It should be pointed out that HZs larger than expected may not be necessarily caused by LOH or chromosomal segment amplification. The presence of HZs with sizes significantly greater than expected random combination in the human population were first reported by Broman and Weber. [34] The authors analyzed 134 individuals from eight CEPH (the Centre d'E´ tude du Polymorphisme Humain) families. All individuals in two (25%) of the families were shown to have at least one segment of homozygosity, and 20% of the individuals in the other six families also had significant homozygous segments. Therefore, when HZs are identified in cancer cells, one cannot simply conclude that these are all caused by genetic alterations in somatic cells unless the genotypes of paired normal tissues are available for comparison. However, the majority of HZs identified in the present study should be different from those identified in the study by Broman and Weber since their study indicates that the fractions of families and individuals among the samples analyzed may be present only in a small portion of human individuals. Families with HZs in all of their members may be caused by particular reasons such as marriage of close relatives. If the HZs in the two families were caused by marriage of close relatives, the sizes of these HZs should be larger than those present in the general population. However, the average size of the HZs in the two families were 10.9 and 18.5 Mb many of which were less than 5 Mb, compared to the averages size of 26.9 Mb with a minimal size of 9.5 Mb identified in the present study.
More intensive analysis using results from the Human

Journal of Carcinogenesis
A peer reviewed journal in the fi eld of Carcinogenesis and Chemoprevention HapMap Project was reported very recently. [35] Their results should be more accurate and more representative of the human population than those described in Broman and Weber because (1) all markers used are SNPs that are more stable than microsatellite markers used in Broman and Weber; (2) 209 HapMap individuals, the majority of which were unrelated were used, instead of 134 individuals in eight families used in Broman and Weber; and (3) much high marker density (one SNP every 500 bp). The authors identified 1393 tracts exceeding 1 Mb in length. However, HZs identified in that study were even more different from those identified in the presents study in two aspects: (1) HZs were defined as uninterrupted clusters of homozygous SNPs spanning at least 1 Mb in a single individual, which allowed authors to study HZs with different criterion but has less statistical consideration compared with the present study; and (2) among the 1,393 tracts identified, only 17 were longer than 5 Mb. The longest HZ was only 17.9 Mb, which is shorter than eight of the 14 HZs identified in the present study [ Table 2].
The presence of significantly more HZs in the MCF-7 cell line than in the human population indicates that it is very likely that the majority of these HZs are the results of LOH.  [4,10,14,15] and karyotyping. [2,3] These results indicate that our method for detecting HZs in cell lines is very reliable.  [15] . Although the HZ from 171.40 to 184.09 Mb on 2q is smaller than the predicted minimum size (26 homologous SNPs in cluster compared to 30 for the minimum size of a non-random zone) it was shown to be amplified in some sub-lines of MCF-7. [9,[13][14][15]36] These findings provide evidence for the hypothesis that HZs may be detected not only in the chromosomal regions affected by loss of one of the homologue copies and in regions showing non-randomness underlain by other genetic alterations but also in those affected by amplification of one allele or dis proportionate amplification of both alleles.
Discrepancies were found between our results and those reported previously. HZs on 17p in the MCF-7 cell line was reported by other studies [14,15] using CGH but was not detected in our study. In addition, we did not detect any alteration on chromosomes 20 and 22, while amplification of 20q and loss of 22q were observed by others. [14,15] One of the possible reasons for these discrepancies could be that the cell line MCF-7 used in different studies may have diverged genetically. Several studies [4,14,15] reported that there was considerable genetic variation among different cultures of the MCF-7 cell line or MCF-7 from different sources. This brings up an important issue that is how to interpret and compare experimental data collected from other biological studies such as drug resistance, gene expression profiling and pathway analysis by different laboratories using the MCF-7 cell line or any other lines from different sources. If a cell line has been cultured by different laboratories for a long period of time, different genetic alterations may have arisen. Results from studies with these cultures could be different. For amplification on chromosome 20q detected by others but not by us, the discrepancy could be because the amplified copy numbers of the two alleles were equal or nearly equal so that our analysis was not able to detect such difference. Indeed, it is interesting to learn that when a chromosomal region is amplified, whether both alleles would be equally, unequally or selectively amplified.
Unlike CGH that can be used to determine DNA copy number (loss or amplification) changes but can't discriminate copy number changes of different alleles, genotype analysis can be used to detect both LOH [37,38] and amplification of individual alleles [21,39] . Genotype analysis can also be used to detect LOH that may not necessarily involve DNA copy number changes, [21] which cannot be detected by CGH. With our method polymorphic sequences can be amplified and used for analyzing samples from very small quantity or even very few copies, [23] while sophisticated whole genome amplification without bias is required for CGH when a small amount of material is used. Resolution was a major issue for CGH, which has been improved by development of microarray based CGH. With genotype analysis, very high resolution can be reached as long as markers are available. Therefore, genetic analysis and CGH may be used as complementary approaches for largescale genetic profiling

CONCLUSIONS
In summary, since many cell lines such as MCF-7 are widely used in biomedical research, learning the genetic backgrounds of these lines and the genetic variation within each cell line from different sources is very important to understand experimental results on a genetic basis and for interpret experimental data correctly. Our system allows genetic profiling of widely used cell lines with a high marker density on a periodical basis to minor genetic changes occurring during passages and/or maintenance in different laboratories.