An Algorithm for Identifying Novel Targets of Transcription Factor Families: Application to Hypoxia-inducible Factor 1 Targets

Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.


Introduction
In the past decade, we have witnessed unprecedented advances in genomic databases. The completion of the human genome project has provided us with sequence information on human genes, along with their regulatory sequences. 1 With the large amount of genomic information, developing effi cient and effective computational tools to analyze such huge genomic data has become an important challenge. One important application of such analysis is in gene fi nding. Some programs for gene fi nding are designed to predict an entire gene sequence. [2][3][4][5][6] However, a majority of them are designed to identify some specifi c gene segments, such as promoters, 7,8 enhancers, 7 exons and CpG islands. 8 Given the special role of transcription factors in gene expression, the identifi cation of transcription factor targets is an important task. [9][10][11][12][13][14][15] A transcription factor controls and regulates gene expression by binding to a particular promoter or enhancer region of the gene. DNA fragment lengths for a transcription factor binding vary from 5 to 25 base pairs. However, a larger region of regulatory elements is involved in gene expression. Thus, in addition to the transcription factor binding site, other sequences may play important roles in gene expression. Therefore, more sophisticated approaches need to be explored in order to accurately identify the relevant sequences that control gene expression. Methods based on frequency of k-tuples and exhaustive pattern search have been proposed. 14 Methods that use both global and local alignments to predict transcription factors, and that considers the binding of transcription factors and cis-regulatory elements were previously described. 8,13 Suffi x tree based methods have been used in pattern discovery problems in biology. While exact pattern occurrences were considered in, 16 detecting transcription factor binding sites using suffi x trees were considered in, 17,18 based on a method for suffi x-tree based inexact pattern matching initially described in. 19 Essentially, inexact (k-mismatch) pattern matching was performed progressively: starting from the root, the method performs an exhaustive comparison of all the symbols on each branch that start from the node against the current position in the pattern, until up to k positions mismatch on the path, or the pattern is exhausted. The time requirement of the algorithms is exponential with respect to the length of the pattern and the size of the symbol alphabet, which makes the approach impractical for moderately sized sequences, or large number of sequences. In this work, we also use suffi x trees as the basis for pattern matching, and consider only exact pattern matching. A key difference in our approach is the consideration of the practical implementation of this important data structure for environments with huge genomic databases, potentially involving millions of sequences, or billions of base pairs.
In this study, we develop a new methodology for identifying novel targets of hypoxia inducible factor 1 (HIF-1) based on the suffi x tree data structure. The methodology includes the following four steps. Step1: Construct the suffi x tree using a set of promoter sequences from known HIF-1 targets as training genes. Then we extract common patterns that occur in every training gene at least once from the suffi x tree.
Step 2: Using the common patterns and known HIF-1 binding site sequences to identify all potential HIF-1 target genes from the genome database.
Step 3: Process the potential HIF-1 targets by positional analysis to select those targets with predicted HIF-1 DNA binding site and common patterns from above at the 5΄ region upstream of the promoter.
Step 4: Analyze the accuracy of the prediction for HIF-1 targets. Step 2 and Step 3 together ensure that interested motifs are located only in the 5΄ upstream promoter region. This approach may be extended to identify potential novel targets of other transcription factors since they share similar characteristics for binding to the DNA sequence.
We use the suffi x tree data structure in the fi rst and second steps. 20 Given a string S [1..n] of length n, a suffi x tree is a rooted tree with n leaves, whereby the i-th leaf node corresponds to the suffi x S[i..n], each edge in the tree is a substring, and no two edges out of a node start with the same character. There are two advantages in using a suffi x tree in complex string matching problems. One is the possibility of fi nding common patterns from multiple strings in linear time, and the other is the potential to search for multiple patterns in multiple strings in linear time (with respect to the length of the concatenated strings). The storage requirement is also linear. Table 1 lists the popular linear time search algorithms commonly used to search multiple patterns against a sequence (multiple sequences). Each algorithm in the table is described in detail in. 20 Assume k is the number of patterns; m i (0 Ͻ i Ͻ k) is the length of a pattern; M is the total length of patterns; M' is the total length of output patterns; n is the length of a sequence; σ is the total number of individual character in the sequence.
The Table 1 compares several available string match algorithms when searching with multiple patterns (i.e. set of patterns) against a sequence. From the table, we can see that the suffi x tree is the worst with respect to preprocessing time, but it outperforms all the others at the search phase. The Θ(n) preprocessing and Θ(M) search of suffi x tree is not achievable by any of the other algorithms. The other methods would preprocess each requested string on input, and then take O(n) or more worst case time to search for the string (n can be huge compared to M in our case). Thus, in theory, the suffi x tree is effi cient in both time and space, and has been used in different applications, such as in multiple genome alignment 21 and in the identifi cation of sequence repeats. 22 However, there is still the diffi culty of practical implementation of suffi x trees suitable for analysis of huge datasets. A major contribution of this work is the development of a simple and innovative methodology for using suffi x trees, which makes it feasible to use them on large genomic databases. We apply the method to the problem of fi nding novel targets of HIF-1 transcription factor, using a database containing millions of sequences, or billions of base pairs.

General methodology
The general methodology used in this study is illustrated in Figure 1. In brief, 1) A suffi x tree is constructed using the set of training genes. A set of common patterns that occur on all training genes at least once is extracted from the suffi x tree. 2) Using the multiple patterns (including the common patterns from the previous step and other

Algorithm
Preprocessing Search Figure 1. The outline of general methodology. The training genes of known HIF-1 targets are built into a suffi x tree, and a set of common patterns are extracted from the suffi x tree. Common patterns (including the set of common patterns and consensus sequences) are used to search the human genome database using the suffi x tree algorithm. Using positional analysis, we analyze the output genes according to the relative locations of HIF-1 binding sites in the genes, and defi ne the output genes with HIF-1 binding sites upstream of translational start site as potential HIF-1 targets. The potential HIF-1 targets are divided into two groups, known HIF-1 target genes and the candidate target genes. Finally, the candidate novel target genes are validated using available microarray data in the literature and tested in the biological lab.
known patterns such as HIF-1 binding sites (see Table 2) and consensus sequences from the literature, the genome database is searched by applying suffi x tree algorithms. This generates the output sequences. 3) Positional analysis is performed on each output sequence according to the functional DNA fragments at the specifi c locations of the sequence. 4) The output targets from the positional analysis are grouped into known target genes and candidate targets. 5) The candidate target genes are further verifi ed by doing biological experiments in the laboratory and by using available microarray data in the literature.

Selection of training genes
We used 21 known HIF-1 target genes, and download all available DNA sequences near HIF-1 binding sites from NCBI Nucleotide database (Table 2). In NCBI Nucleotide GenBank, there are gene features for each gene in the annotation database. 45 We extract 25 different DNA subsequences containing promoter and flanking sequence from these 21 HIF-1 target genes according to the feature information provided in GenBank. The length of subsequence for each HIF-1 target gene training sequence could be different. In these 25 subsequences, there are four genes: HO1, LDHA, EPO, and ENO1 with two different subsequences. Only one subsequence is used for each gene in the remaining 17 HIF-1 target genes. Thus, the known HIF-1 target genes are 21, and the subsequences are 25. We used leave-k-out cross-validation method 46 to select appropriate number of training gene subsequences for this study. Twenty-fi ve HIF-1 gene subsequences are used in this analysis. We denote the 25 HIF-1 target gene subsequences as SET25. The following steps are used: Step 1: 15 training subsequences are randomly selected from SET25.
Step 2: these 15 training subsequences are built into a suffi x tree and then a set of common patterns that occur at least once in each gene are extracted from the suffix tree.
Step 3: these common patterns and HIF-1 binding sites are used to search against SET25.
Step 4: the number of the output genes is determined and the accuracy of the approach is calculated.
Step 5: Steps 1 to 4 are repeated 1000 times, and the average results are recorded. Similarly, the above procedure is repeated using different numbers of training genes, namely 10, 12, 18, and 20 HIF-1 target gene subsequences. We obtained similar detection accuracy by using 15 and 18 training sequences, and lower detection accuracy using 10 and 12 training sequences. The detection accuracy using 20 training genes is slightly higher. However, the number of common patterns using 20 training genes is much smaller, which could lead to more potential false HIF-1 target genes in the prediction. Thus, we randomly selected 15 training genes in this study. The  Suffi x tree algorithms for searching genome database To facilitate the practical application of suffi x trees on the huge genome database, we use a sliding window method which signifi cantly improved the speed of the algorithms and reduced computer memory requirement. The basic idea is to sequentially analyze smaller chunks of the database based on a chosen window size. Considering a simple example using the string "CACGTGTTATGG" as shown in Figure 2, we wish to determine whether "TT" is in the string. The length of the longest pattern is two in the string. If the machine is able to process fi ve characters at a time, a fi xed window of fi ve characters is adopted, and an overlap of one character is needed (overlapping size = the length of longest pattern −1). The window slides from the left to right with the movement size of four characters (movement size = window sizeoverlapping size). In the fi rst phase, a substring of fi ve characters "CACGT" is read, and used to construct a suffi x tree to be searched using the pattern "TT". In the next phase, the last character "T" from the previous phase is kept, and a substring "TGTTA" should be used to construct a suffi x tree. The same process is performed until the search condition is met or the whole string is read. For a short string, the advantage of using the sliding window may not be obvious. However, the sliding window method becomes extremely important when the string is long and the available computer memory is limited. For example, for large DNA sequences with 5,000,000 base pairs or a concatenation of several DNA sequences, the sliding window method has a noticeable advantage. The sliding window is particularly useful when the whole database (10, 268, 238, 630 base pairs in our case) is needed to be built into a suffi x tree. The whole database can be viewed as a large string formed by concatenating all the DNA sequences in the database.
In this section, we describe the algorithms used to search the huge genome database to identify the potential novel target candidates. We use both the common patterns from the training genes (Table 3), and known HIF-1 binding sites (Table 2) as criteria in this search. If a gene contains all the common patterns and one of the HIF-1 transcription factor binding sites, then the gene is selected as an output gene. The stage of searching the huge genome database is a major bottleneck in fi nding potential novel transcription factor targets. Thus, three algorithms are proposed for this task. We refer to these three algorithms as Algorithm 1, Algorithm 2 and Algorithm 3, respectively.
Algorithm 1 constructs one suffi x tree for each sequence, then uses the common patterns to search against each suffi x tree. Algorithm 1 is described as follows: is applied to a huge database such as the genome database, the suffi x tree ST d is built from all the sequences in the database. Thus, it requires a powerful machine with a huge memory. If we have such a machine that can be used to build a suffi x tree for all the database sequences, this algorithm certainly would have some advantages: the whole database only needs to be built into a suffi x tree once, and the database can be stored as one big suffi x tree. It can be used to search different pattern sets as many times as one may wish. In this case, the search process is very fast, since the time used is linear with respect to the length of concatenated common patterns.
The proposed algorithms utilize the sliding window method to build a suffi x tree (except for Algorithm 3). The processed DNA sequence is in FASTA format. A line of FASTA format DNA sequence contains 80 characters except the ending line. Thus, the sliding window algorithm process 100 lines (8000 characters) at a time, for a fi xed window size of 8000 characters.

Positional analysis
Using Algorithm 1 and Algorithm 2, we searched the genome database. The output genes from both algorithms were the same. The only difference was the time each required. We further analyze the output genes using positional analysis. A typical schematic diagram of a target gene activated by HIF-1 is shown in Figure 3. It is known that HIF-1 has the consensus binding site "RCGTG" (R stands for any of the four nucleotides: A, C, G, and T) at its target genes. [41][42][43][44] All the known HIF-1 binding sites are at the 5΄ region upstream of the promoter sequence, that is, in 5΄ enhancer region, except erythropoietin (EPO) which contains HIF-1 binding site in the 3΄ enhancer region. From the information provided by the annotation databases in GenBank, it is quite diffi cult to obtain the stop site of gene coding sequence. Therefore, in the positional analysis, we only select the potential HIF-1 candidate targets that contain HIF-1 binding sites in the 5΄ region upstream of the promoter.
To identify genes that have the HIF-1 binding site in the 5΄ region upstream of the promoter, we need to fi nd the HIF-1 binding site which is in the 5΄ enhancer region from the target gene sequences. Letting V s denote 5΄ region upstream of the promoter, three methods are used to extract V s from gene sequence based on the feature tables provided in the GenBank annotation database. 45 Method 1: For those gene sequences with the available enhancer sequence and position in the feature table, we extract the enhancer DNA sequence as V s . Method 2: For those gene sequences with the available promoter region and sequence in the feature table, V s is the DNA sequence of the 5΄ region upstream of the promoter plus the promoter region. Method 3: For the remaining gene sequences with no information on either the promoter or enhancer sequence, we search for the fi rst position of the beginning of "CDS", "TATA" box, or "CAAT" box sequences, called E e . Then, we extract DNA sequence from 5΄ end to E e as V s . After determining V s by using the above three methods, we use Boyer-Moore fast string matching algorithm 20 to search whether the HIF-1 binding site "RCGTG" is inside V s .

Lab verifi cation
Human prostate cancer cells, PC-3 cells were cultured in RPMI 1640 supplemented with 10% fetal bovine serum (Intergen, Purchase, NY), 0.2 units/ml human insulin (Sigma, St. Louis, MO), 50 units/ml penicillin, and 50 mg/ml streptomycin (Invitrogen, Carlsbad, CA). These cells were seeded in a 12-well plate overnight, and transfected with the indicated plasmids using lipofectamine (Sigma) per the manufacturer's instructions. Briefl y, COX-2 reporter plasmid (0.4 μg) containing a 960-bp human COX-2 promoter with the potential HIF-1 binding site was co-transfected with β-gal plasmid, and the control vector, HIF-1 dominant negative construct, or HIF-1α expression plasmid using 2 μl Lipofectamine per well in serum-free Opti-MEM media (Invitrogen, Carlsbad, CA) for 30 min. The transfection solution was then added to the cells, and incubated with cells for 4.5 h. The cells were then washed and cultured in the medium for 36 h. The cells were collected and analyzed using luciferase analysis buffer (Promega, Madison, WI). Luciferase activity was measured using a moonlight luminometer, and β-gal activity was measured as a control using the above cellular extracts. The relative luciferase activity was the ratio of luc/β-gal with the value normalized to the control as described previously. 27,49

Results
In this study, we have used HIF-1 target genes as a model system, and developed a new methodology for identifying the novel HIF-1 target genes. Using a training set of 15 known HIF-1 target genes, we have obtained 238 potential HIF-1 targets including 25 known HIF-1 targets from a large genome database. Although suffi x trees have been around for some time, the key innovation in our approach is how to use them effi ciently on a large database, using a standard personal computer. Our proposed method is particularly effi cient, handling a large database of 2,078,786 DNA sequences with a total of 10,268,238,630 base pairs on a PC with 2.8 GHz, and 512 RAM. This confi rms the feasibility of the proposed methodology. In addition, through literature search, 17 putative novel targets are verifi ed by microarray data to be upregulated by HIF-1 or hypoxia. We also considered COX-2, one of the potential new targets proposed by our algorithm, and confirmed that COX-2 is a biologically relevant HIF-1 target gene. These results further demonstrate that this new methodology is effective in predicting novel HIF-1 targets.

Common patterns from training genes
To obtain the common patterns of HIF-1 target genes, we built a suffi x tree using the randomly selected 15 known HIF-1 target training genes. From the suffi x tree, we extracted a set of 105 common patterns that occurred in all training genes at least once. We fi xed the minimum length at 4 base pairs. These are listed in Table 3.

Comparison of algorithms for searching genome database
The suffi x tree data structure is constructed in linear time using Ukkonen's linear time algorithm. 20 The three algorithms proposed all have the same overall theoretical running time complexity. Each requires linear time, with respect to the total size of the database (i.e. length of all the concatenated database sequences). We consider the algorithms in terms of the suffi x tree construction time, search time using the suffi x tree, and memory requirement for the two stages. This is summarized in Table 4.
In terms of running time, the major difference is how much time each algorithm spends in constructing the suffi x tree(s), or in searching while using the constructed suffi x tree(s). For instance, while Algorithm 1 and 3 spend more time in constructing the suffi x tree O(n s l s ), they spend less time in searching on the suffi x tree O(n p l p ), where n s = number of sequences in the database, n p = number of common patterns, l s = average length of a sequence, and l p = average length of a pattern. The reverse is the case for Algorithm 2.
The overall time complexity (combining tree construction and searching) remains the same for the algorithms. The memory requirement is, however, quite different for the three algorithms. For Algorithm 2, the advantage is that we only need to build a suffi x tree for the multiple patterns once, then use it throughout the whole search. Algorithm 3 for instance requires extra memory proportional to the size of the entire database. It is obvious that Algorithm 2 should be the fastest and most practical if we do not have a powerful machine to support Algorithm 3. This is because, on average, the total length of the common patterns (i.e. after concatenation) is usually shorter than the length of a gene sequence, and the preprocessing time to build the suffi x tree is quite short. Moreover, the suffi x tree for the common patterns only needs to be built once. In practice, Algorithm 2 is the fastest of the three algorithms, although it has the same space complexity as Algorithm 1.
Algorithm 1 and 2 are more practical for those who do not have a supercomputer with huge memory. For instance, in our case, computational experiments were carried out on a Pentium 4 PC with 2.8 GHz and 512 MB memory. Thus, we implemented Algorithms 1 and 2, and use them to search the genome database.
The nucleotide database was divided into approximately 6 equal parts (based on the number of sequences). Algorithm 1 and Algorithm 2 were executed separately on these 6 parts of the database. The comparative results are shown in the Table 5. As can be observed, in each part of the database, Algorithm 2 processed more DNA sequences and more bytes per minute than Algorithm 1. On average, Algorithm 2 is about 36% faster than Algorithm 1.

Output genes from genome database
The fi nal output genes after processing for the positional analysis are divided into two groups: the mammalian group contains genes from mammals, such as human, rat and bovine; the other group contains genes from non-mammals, such as virus and plant. Within the potential novel targets, the same gene in different species is counted as one gene. One of the goals is to fi nd genes that may have important implications in human health and disease research. Thus, further analysis of the genes in the mammalian group was conducted. A total of 258 distinct genes were identifi ed.

Verifi cation of candidate targets
After applying positional analysis to the output genes, the remaining genes are called candidate targets. We further characterize the candidate targets using three approaches: by using known HIF-1 target genes in the literature, by microarray data from literature search, and by biological lab verifi cation.
Verifi cation of potential novel HIF-1 targets using known HIF-1 targets In our fi nal output, there are 25 known HIF-1 targets identifi ed. Inside these 25 known output targets, there are 15 HIF-1 targets that are used for the training analysis. Additional six genes in the predicted output were also known HIF-1 targets: cyclin G2, p21(WAF), PGK, TGFα, Nip3, and trefoil factor. These 25 HIF-1 targets are shown in Table 6. The validation of candidate novel HIF-1 targets using available microarray data In a follow-up literature search, additional 17 putative novel HIF-1 targets from the output list were confi rmed to be upregulated by HIF-1 or hypoxia by the microarray data. These targets are shown in Table 7. This result showed that our predicted novel HIF-1 targets can be found as upregulated targets of HIF-1 and hypoxia, further confi rming the accuracy of our prediction.

Laboratory validation of a candidate novel HIF-1 target
We selected one of the candidate HIF-1 targets identifi ed as described above to be tested in the biology laboratory. The verifi ed gene was human cyclooxygenase-2 (COX-2) gene. There are two reasons for selecting COX-2. First, COX-2 is important in biological function such as tumor growth and angiogenesis. Second, the availability of COX-2 promoter construct (kindly provided by Dr. Jian Li, Harvard University, MA). It is diffi cult to obtain promoter constructs for each gene in our fi nal output. COX-2 was a putative target at the time the experiment was carried out (See, 47 but its regulation by HIF-1 has been recently published independently. 48 It is known that HIF-1 target genes are regulated at the transcriptional level by triggering their promoter activity. Therefore, to determine whether HIF-1 expression plays a role in COX-2 transcriptional activation, PC-3 prostate cancer cells were transfected with a COX-2 promoter reporter containing a 960-bp human COX-2 promoter with the potential HIF-1 binding site. Expression of HIF-1 dominant negative construct specifi cally inhibited HIF-1 activity, and inhibited the COX-2 reporter activity in a dose-dependent manner (Fig. 4a). This result indicates that HIF-1 activity is required for COX-2 transcriptional activation. In order to determine whether HIF-1 is suffi cient to induce COX-2 transcriptional activation, HIF-1α expression plasmid was co-transfected with the COX-2 reporter. The expression of HIF-1α in PC-3 cells induced HIF-1 expression and COX-2 reporter activity in a dose-dependent manner (Fig. 4b). Thus, HIF-1α is also suffi cient to induce COX-2 transcriptional activation. This data demonstrates that COX-2 is a functional HIF-1 target. These result further shows that our methodology is effective in identifying HIF-1 novel targets. Lab verifi cation indicates that HIF-1 is essential in regulating COX-2 transcriptional activation.
While there are certainly many potential HIF-1 targets in the fi nal output, we performed experiments on COX-2. The complete list of output genes is in the supplementary fi les. We hope that the results of this work will spur others to run the required biological experiments to validate the genes from the fi nal list and to test these potential HIF-1 targets.

Discussion
The basic methodology in this study is as follows: 1) extract common patterns from the known gene sequences; 2) use the set of common patterns to search the genome database; 3) analyze the target genes according to the specifi c gene's feature in the database. The methodology proposed here is to identify HIF-1 novel target genes using a combination of the specifi c HIF-1 binding sequence "RCGTG" and the common patterns. Our approach can be applied to other transcription factors. The transcription factors generally have common DNA binding sequences such as activator protein 1 (AP-1), 38 and nuclear factor-kappaB (NF-kB). 39 AP-1 has the common binding site "TGACTCA". 54 NF-kB has the common binding site "CAAGGAGGGAA TTCCCGAGT." 55,56 The methodology may be extended to study other functional genes because many genes are conserved across widely divergent species with similar functions. Genes with similar functions may have similar structure and sequences. Genes belonging to the same family commonly share specifi c sequences and/or consensus sequences. The idea is to generate the common patterns from known genes, then to use these common patterns to search for unknown novel targets. Thus, steps 1 and 2 may be applied to novel function prediction based on gene structure. We use the annotation database in GenBank which is available to the public. Apart from transcription factors studied here, the databases can be used to study other functional DNA segments, such as exons, introns, miRNAs, and 5΄UTRs. For a different kind of gene, step three needs to be changed to adapt to the specifi c gene's feature, but the basic idea remains the same.
Furthermore, the approach may potentially be applied to other genes that have known consensus sequences and common regulatory patterns. The suffi x tree method can be applied to general gene clustering and classifi cation that needs to group and categorize similar genes together. An improvement in the results (for instance, further fi ltering the output target genes) could be obtained by combining the proposed suffi x tree approach with statistical models.
Although the suffi x tree data structure is used for exact string matching in this study, the suffi x tree analysis can be further developed for inexact string matching problems. 20 The inexact matching such as k-mismatch is an inexact pattern matching problem: identify all the occurrences of pattern P in text T which allowing k characters of mismatch of pattern P. k-mismatch is very useful to fi nd functional similarities (or gene mutations) among genes in bioinformatics. 17,18,20 In DNA sequences, mutation, insertion or deletion of nucleotide(s) happens frequently across different species or different individuals where the functional signals may not show up exactly. MicroRNA (miRNA) are a class of small non-coding RNAs with 21 to 23 base pair in length with hairpin structure, that play important roles in regulating post-transcription mRNA expression in animals and plants. Identification of miRNAs using computational methods is successful. 57 Most of computational prediction of novel MiRNA is based on phylogenetic conservation and structure similarity in closely related species, such as human, 57,58,60 animal, 57,60 insect, 57,59 and plants. 57 It would be interesting and useful to extend this suffi x tree method to identify the potential targets of miRNAs in the future study. Taken together, the approach proposed here may be used as a general methodology to identify novel gene targets of a given transcription factor, and to study other gene function and regulation in the future.  Figure 4. Effect of HIF-1 expression on COX-2 transcriptional activation. PC-3 prostate cancer cells were seeded into 6 well plates a day before the transfection. a) To determine whether HIF-1 activity is required for COX-2 transcriptional activation, the cells were co-transfected with COX-2 promoter luciferase reporter (PXP4/COX-2), pCMV-β-gal, and pcDNA3 vector or pcDNA3-HIF-1 dominant negative plasmid. b) To determine whether HIF-1 expression is suffi cient to induce COX-2 transcriptional activation, the cells were co-transfected with the COX-2 promoter reporter, pCMV-β-gal, and pcDNA3 vector or pcDNA3-HIF-1α wild type expression plasmid. The cells were cultured for 36 h after transfection. The relative luciferase activity was determined by the ratio of luciferase/β-gal activity, and normalized to the vector control (100%). *Indicates the signifi cant difference when the value is compared to the control (p Ͻ 0.01).