PBOV1 Is a Human De Novo Gene with Tumor-Specific Expression That Is Associated with a Positive Clinical Outcome of Cancer

PBOV1 is a known human protein-coding gene with an uncharacterized function. We have previously found that PBOV1 lacks orthologs in non-primate genomes and is expressed in a wide range of tumor types. Here we report that PBOV1 protein-coding sequence is human-specific and has originated de novo in the primate evolution through a series of frame-shift and stop codon mutations. We profiled PBOV1 expression in multiple cancer and normal tissue samples and found that it was expressed in 19 out of 34 tumors of various origins but completely lacked expression in any of the normal adult or fetal human tissues. We found that, unlike the cancer/testis antigens that are typically controlled by CpG island-containing promoters, PBOV1 was expressed from a GC-poor TATA-containing promoter which was not influenced by CpG demethylation and was inactive in testis. Our analysis of public microarray data suggests that PBOV1 activation in tumors could be dependent on the Hedgehog signaling pathway. Despite the recent de novo origin and the lack of identifiable functional signatures, a missense SNP in the PBOV1 coding sequence has been previously associated with an increased risk of breast cancer. Using publicly available microarray datasets, we found that high levels of PBOV1 expression in breast cancer and glioma samples were significantly associated with a positive outcome of the cancer disease. We also found that PBOV1 was highly expressed in primary but not in recurrent high-grade gliomas, suggesting the presence of a negative selection against PBOV1-expressing cancer cells. Our findings could contribute to the understanding of the mechanisms behind de novo gene origin and the possible role of tumors in this process.


Introduction
The origin of novel genes in the evolution of multicellular organisms has long been postulated to play a fundamental role in the development of new functions [1]. There are several wellestablished mechanisms of novel gene origin. For example, duplication and divergence, retroposition, gene fusion, exon shuffling and horizontal gene transfer all rely on reuse of the pre-existing genetic material (see [2] for review). It has been also proposed that some protein-coding genes might have originated de novo from non-coding genomic regions through a series of mutations ultimately leading to the appearance of a novel protein-coding transcript. The resulting proteins might be fixed in the evolution either as a result of genetic drift or due to an accidental positive contribution to the organism fitness. The positive selection following the fixation might further enhance the functionality of such proteins.
Despite the de novo mechanism of gene origin for a long time being considered unrealistic, there is a growing number of reports from various species that show that de novo gene origin is a widespread process that takes place in all branches of the tree of life [3][4][5][6][7][8][9]. However, the detailed understanding of the de novo gene origin is still missing, including what are the forces that drive the initial fixation of a newly originated gene and how does its function get shaped and integrated into the organism context. We have earlier hypothesized that tumorogenesis may play an important role in the novel gene origin and fixation (detailed in [10] and [11]). Briefly, one prominent feature of various tumor types is the abundant upregulation of various transcripts, many of which have an uncharacterized function [12,13]. One example is the large class of so-called cancer/testis antigens. These genes are controlled by CpG-island based promoters and are activated preferentially in spermatocytes and in various cancer types, whereupon the activation in both cases is linked to a widespread loss of CpG methylation [14,15]. Most of such transcripts lack an established function and are silent in most of the normal tissues. However, some may happen to have a protein-coding potential and thus can be potentially classified as de novo genes.
Second, Fisher and co-workers [16] have showed that some of the proteins from a library of randomly generated protein-coding sequences were able to rescue auxotrophic mutants of E. coli. Although this proof-of-principle example shows that a previously noncoding and non-optimized sequence may readily give rise to a minimally functional protein, we believe that in most cases a recently emerged de novo gene would initially lack functional features such that they would be sufficient to facilitate its evolutionary fixation. Given the ongoing mutational process and the lack of selective pressure the half-life of such a gene could be relatively short. We hypothesized that the expression of emerging de novo genes in tumors might in some way help to create a phenotypic feedback loop that would facilitate evolutionary fixation of those genes and their further functional integration into the context of the organism. With an aim to find specific examples to support this hypothesis, we focused on searching for human evolutionarily novel genes with a preferential expression in tumors. We previously reported several such transcripts, but most of them lacked protein-coding potential [17,18]. In the course of our search, we came across a study of Clamp and co-workers that aimed at filtering the human protein-coding gene catalog by removing misannotated non-coding genes based on a combination of criteria, such as presence of orthologs in mouse and dog genomes, PFAM domain signals, Ka/Ks ratio etc [19]. Besides reporting that approximately 20% of human genes were misannotated as protein-coding, the authors also provided a list of 10 genes that had been classified as spurious but coded for experimentally validated proteins. We previously analyzed the evolutionary history and EST-derived expression profiles of the genes in this list and, interestingly, we found that one gene in this list, PBOV1, lacked orthologs in non-primate genomes and its mRNA/EST sequences had been exclusively derived from tumor sources [20].
PBOV1 (UROC28, UC28) is a human protein-coding gene with a 2501 bp single-exon mRNA and a 135-aa open-reading frame. The gene has been first characterized by An and co-workers [21] as being overexpressed in prostate, breast, and bladder cancer. The authors expressed the protein in vitro, produced antibodies and showed that PBOV1 protein was present in the blood of prostate cancer patients but not in the healthy controls. They also showed that PBOV1 expression in prostate cancer cells was upregulated by androgen treatment [21]. Another group reported that PBOV1 transcription in breast cancer cells was positively regulated by estradiol [22].
We previously reported that PBOV1 gene was expressed in multiple types of human tumors, but not in normal tissue samples [20]. However, the expression studies in our previous work were not fully conclusive because the RT-PCR experiments did not include adequate DNA contamination controls.
Here we perform a focused analysis of PBOV1 evolutionary history, expression regulation and disease association. Using comparative genomics analysis we show that the PBOV1 proteincoding sequence is by 80% unique to human and has originated de novo during the evolution of primates through a series of frame-shift and stop-codon mutations.
We verify our early report of PBOV1 tumor-specific expression [20] with a new series of expression profiling experiments that use a different batch of cDNA samples and include comprehensive controls for cDNA quality and genomic DNA contamination. Furthermore, we analyze publicly available genomic, microarray and ChIP-seq data to shed light on the possible mechanisms behind PBOV1 transcriptional activation and uncover any links between PBOV1 expression and cancer clinical outcome. Finally, we report that the expression levels of PBOV1 in breast cancer and glioma clinical samples positively correlate to patient relapse-free survival. Based on our findings we speculate that PBOV1 gene could function as a tumor antigen and a suppressor of certain types of cancer. We hypothesize that the fixation of this gene in the human evolutionary lineage could be promoted by a tumormediated immunological feedback.

Results
PBOV1 protein-coding sequence originated de novo in human evolution and appears to evolve neutrally According to hg19 version of Human UCSC Genome Browser [http://genome.ucscs.edu], PBOV1 gene is mapped to chr6:13895379127-13895399627, within the fourth intron of the BIG3 (KIAA1244) gene, approximately 56 kbp downstream of BIG3 transcription start site. PBOV1 is transcribed from the strand that is opposite to BIG3. The transcript consists of a single exon 2501 nt long and contains an ORF that spans from 96 to 503 nt coding for 135 amino acids.
We performed a detailed comparative genomic study of the protein-coding sequence (CDS) of PBOV1. We extracted the multiple alignment of 34 genomes of placental mammals (see Materials and Methods for the list of species) from the database of MULTIZ multiple genome cross-species alignments [23] that is available from UCSC Genome Browser. For each genome, we computed the fraction of human CDS that can be aligned with it. Based on the presence of frame-shift mutations and stop-codons, we deduced the fraction of human protein sequence that was homological to the putative protein that could result from translation of the target sequence in the other species. We mapped the results to the mammalian evolutionary tree and indicated the key evolutionary steps that led to the appearance of human PBOV1 ( Figure 1A). The coding sequence of PBOV1 appears to be poorly conserved in the mammalian evolution. It is virtually absent from genomes of Atlantogenata, except for the rock hyrax genome to which 71% of the human sequence can be aligned. At the same time, the sequence homologous to PBOV1 CDS is present throughout Boreoeutheria. We can conclude that the last common ancestor of this clade most likely had at least 97% of the modern human CDS (as the maximum of 97% of human sequence could be aligned to the genomes of horse and megabat) as well as the starting ATG codon. However the orthologous loci in Laurasiatherae or Glires cannot encode for a protein with a significant similarity to the human PBOV1: in Glires the starting ATG codon is mutated, thus eliminating the open reading frame, and in Laurasiatherae a frame-shifting deletion at 12 bp limits the protein similarity by the N-terminal 3% of the human sequence.
More than 99% of the human PBOV1 CDS can be aligned with every primate genome that we studied. However, the presence of an early stop codon in non-hominid primates limits the similarity to the human protein by the N-terminal 20%. This stop codon is mutated in the common ancestor of Hominidae, opening the reading frame. However, this frame extends beyond the humanidentical polyadenylation signals, which could mark the ends of the putative transcripts in the genomes of gorilla, orangutan and chimp. This would mean that the PBOV1-like transcripts in those species may be subject to the non-stop decay [24] and hence cannot encode a protein, unless the transcripts in those species terminate at a different polyadenylation signal further downstream. But even in this case, the resulting protein would be more than 660 amino acids long and thus would have less than 20% of sequence in common with PBOV1 protein. Finally, a 1-bp deletion that has occurred in the ancestor of modern human after the split with chimp led to a frame-shift that has finally shaped the human PBOV1 protein-coding sequence by putting a stop codon in frame and fixing its length at 135 codons.
The CDS of PBOV1 gene does not show a significant base-wise conservation across mammals: PhyloP [25] mean pairwise conservation -log-p-value was 0.07+/20.82. Another common indicator of a selective pressure on a protein-coding sequence is the ratio of non-synonymous to synonymous substitutions (Ka/ Ks), which has an average value 0.21 for a typical human humanchimp gene pair [26]. We computed Ka/Ks ratio using the method of Comeron [27] in a multiple alignment of human CDS with rhesus, gorilla, orangutan and chimp genomic sequences and did not find it to be significantly different from 1.0 (Ka/Ks 0.958, 95% CI 0.598-1.876), indicating that the amino acid sequence in those organisms is evolving neutrally.
Evolutionary features such as low sequence conservation, lack of Ka/Ks bias and multiple frameshifts could indicate a spurious open-reading frame in a non-coding transcript that has been misannotated as a protein-coding gene. However the existence of PBOV1 protein has been previously shown experimentally in [21]. To additionally support the existence of the protein, we searched the EBI PRIDE database of MS/MS identifications and found two distinct peptides that uniquely matched PBOV1 protein sequence and together covered 32% of the protein.
We have estimated the codon usage score for PBOV1 coding region using the method of Guigó [28] (See Methods for details). The score quantifies the preferential use of synonymous codons, and higher values indicate that the sequence uses codons with abundant corresponding tRNAs. High codon usage indices indicate the high efficiency of mRNA translation and are typically observed in genes selected for high levels of expression. For PBOV1, we obtained a codon usage score of 0.21 which is unexpectedly high for an ORF that has recently originated from a non-coding sequence and is significantly higher than expected in a random sequence of the same length and base composition (p = 0.004, based on bootstrapping by sequence reshuffling). For comparison, the average codon usage score for a human gene is 0.15 [4]. While we can only conclude that such high codon usage score is a result of a pure coincidence, it might be one of the factors that positively contributed to the actual protein-coding capacity of the recently emerged ORF, as it is known that codon usage has a significant influence on human gene expression [29].
These findings altogether strongly suggest that human PBOV1 is a protein of a very recent de novo evolutionary origin, with 80% of sequence being specific at least to Hominidae. Chimp, gorilla and orangutan either lack homologous proteins due to a non-stop degradation or encode for homologs of a much higher length, which practically means that PBOV1 protein can be considered a human-specific. Despite the recent origin from a non-coding sequence, PBOV1 CDS has an unusually high codon usage preference index and the existence the corresponding protein has been shown experimentally. The values next to species names show fractions of CDS of human PBOV1 that could be aligned with the respective genome and fractions of encoded proteins (assuming that they exist) that could be aligned with the human PBOV1 protein. For selected taxons, the most probable values of those fractions in the last common ancestor (LCA) are given. The genome of LCA of Boreoeutheria most likely contained the start codon of PBOV1, 97% of respective genomic sequence (as the maximum of 97% of human sequence could be aligned to the genomes of horse and megabat) and 7% of the putative protein sequence. However, in rodents and Lagomorpha the frame was lost due to a mutation in the ATG codon. Laurasiatheria retain up to 97% of the genomic sequence homologous to PBOV1 CDS, but the protein homology is below 3% due to a synapomorphic frame-shift deletion. All higher primates contain at least 99% of human genomic sequence, but the protein homology is only 20%. An important evolutionary event along the human lineage was the ART substitution at the position 90 in the last common ancestor of Hominidae which removed the stop codon. However, all Hominidae genomes lack an in-frame stop codon over the span of the human transcript, which could make the transcript in this species a target of the non-stop decay [24]. Finally, a single nucleotide deletion that occurred after the divergence from chimp led to a frame-shift that finally shaped the modern human PBOV1 protein sequence. B: Multiple alignments of human PBOV1 CDS with orthologous loci from selected mammalian species. The stretches of genomes that contribute to the putative protein homology to human PBOV1 are highlighted in yellow, followed by the features that disrupt protein homology (frame-shifts and stop codons). For the sake of representation, the exact sequences of species-specific insertions are omitted from the alignment. Bioinformatics analysis of PBOV1 protein shows a lack of functional features A PSI-BLAST search of PBOV1 protein sequence against the UniProt NRDB90 database resulted in no hits with an E-value below 10, indicating a lack of proteins with significant homology and confirming our conclusion about the recent de novo origin of PBOV1 protein. We further searched for putative fold and domain structures of PBOV1 protein using freely available online tools. Because the protein is not evolutionarily conserved, we used IPSSP [30] software for secondary structure prediction, which, to our knowledge, is the most accurate secondary structure prediction tool that does not rely on evolutionary information. According to IPSSP prediction, PBOV1 protein contains 4 short alpha-helices covering 35% of the sequence with the rest being disordered. A search for structural domain motifs in PBOV1 using I-Tasser threading server [31] produced no significant hits, as all the predictions scored below 23.5. PBOV1 protein contains 4 cysteines and the predictions made by DiANNA [32] web server showed that two of them (pos. 49-122) might form a disulfide bond. Additionally, a search for post-translational modification predictions was performed using CBS prediction server tools [http://www.cbs.dtu.dk/services] and significant scores for phosphorylation were obtained on serines 62, 94, 101 and tyrosines 82 and 89.

PBOV1 has a broad and highly tumor-specific expression profile
We studied the expression of PBOV1 gene in a broad range of cancers and normal tissues using PCR on panels of cDNA from various normal tissues and tumor samples. First, we have tested the expression in Clontech MTC I, MTC II and Immune System cDNA panels. We did not observe any expression signal in any of the 37 adult and fetal tissues tested ( Figure 2). This result was identical to the one that we previously reported with an independent batch of cDNA panels obtained from different donors [20].
Next, we studied the expression of PBOV1 in the cDNA panels of tumor samples. The BioChain cDNA panel consisted of 32 samples from tumors of various histological types obtained from 28 different organs and tissues. We observed a specific signal in tumors of 16 different tissues and organs: brain, lung, liver, gall bladder, stomach, small Intestine, colon, ovary, fallopian tube, uterus, ureter, prostate, adrenal gland, parotid gland, pancreas, thymus, testis and spleen ( Figure 3A). This result was highly consistent with the one that we previously reported using cDNA panels obtained from a different batch of tumor samples [20].
We further studied the expression of PBOV1 in a panel of cDNA from clinical tumor samples that had been isolated in our laboratory (see Methods). The panel contained samples from various tumor types: breast (6 samples), female reproductive system (10 samples), lung (3 samples), testis (1 sample) and lymphomas of various geneses (8 samples). The results of PCR on this panel are presented in Figure 3B. We observed a specific signal in 22 out of 31 tumor cDNA samples, including breast cancer, cervical, ovary and endometrial cancer, lung cancer, non-Hodgkin lymphomas, meningioma and seminoma.
PBOV1 expression in breast cancer and glioma positively correlates to relapse-free survival Human PBOV1 gene encodes a protein of recent de novo origin which lacks evolutionary conservation and recognizable protein domains. This, taken together with the lack of expression in normal tissues, makes one question whether the encoded protein has any physiological function in the human organism. Nevertheless, a missense SNP in PBOV1 gene that results in I73T substitution was previously found to be associated with an increased risk of breast cancer in Cypriot population [33].
We decided to investigate whether the expression of PBOV1 in breast cancer and other cancer types is correlated with the disease progression and outcome. For this, we searched for publicly available datasets from studies that correlated tumor sample expression profiles with disease progression and clinical outcome.
First, we used the GOBO online tool to perform a Kaplan-Meier survival analysis with respect to PBOV1 expression levels in a pooled dataset from 6 independent studies that measured gene expression profiles in the clinical samples of breast cancers [34]. There we found that higher levels of PBOV1 significantly correlated with relapse-free survival (p = 0.013) as shown in Figure 4A. Out of the various clinical subgroups, we found that the significant association could only be observed for patients with lymph node metastases but not for patients without lymph node metastasis. Similarly, the association with relapse-free survival was significant in the group of patients with grade 2 tumors but not for patients with tumor grades 1 and 3.
Next, we analyzed the dataset from an independent study that correlated gene expression profiles of estrogen receptor-positive breast cancer with relapse-free patient survival over 5 years following tamoxifen therapy (Gene Expression Omnibus (GEO) accession GDS806 [35]). In this dataset, we found that higher levels of PBOV1 expression positively correlated with progressionfree survival ( Figure 4B, one-tailed T-test p = 0.02).
We obtained a similar result from the analysis of a gene expression dataset of clinical glioma samples (GEO accession GDS1816 [36]). Here we found that tumor samples from patients with proneural glioma who survived for more than 209 weeks showed significantly higher PBOV1 expression levels when compared to patients that survived 52-209 weeks ( Figure 4C, one-tailed T-test p = 0.04). Moreover, samples of primary proneural glioma tumors showed a higher PBOV1 expression than samples of recurrent proneural gliomas ( Figure 4D, onetailed T-test p = 0.001), suggesting that there might be a negative selection against cancer cells expressing PBOV1 over the course of cancer somatic evolution.
Finally, we analyzed a microarray dataset that profiled 22 prostate cancers samples and non-cancerous prostate samples from different patients (GEO accession GDS1746 [37]). Here we found that PBOV1 expression was significantly higher in samples from cancer stage III than from stage II (p = 0.0012). However, after accounting for stage-specific expression differences, we could not find any significant correlation of PBOV1 expression with the relapse-free survival in this dataset. This result either suggests that PBOV1 expression is not associated with the outcome of prostate cancer, or could also be due to a small size of the dataset (22 samples), which limits the detection power.
Regulation of PBOV1 gene expression PBOV1 shows a strong tumor-specific pattern of expression with a certain affinity towards such hormone-dependent cancers like breast and prostate cancers.
The result above suggests that the activation of PBOV1 expression in cancers could be due to the binding of some specific transcription factors to the promoter region. We analyzed transcription factor ChIP-seq data from the ENCODE project [40] and found moderate binding signals for C/EBPb factor and EP300 co-activator at 1.5 kb upstream of TSS and a strong enhancer at 4.8 kb upstream of TSS that contained binding sites of FOXA1, FOXA2 transcription factors and EP300.
In order to test whether PBOV1 expression could be regulated by C/EBP transcription factor family, we analyzed the dataset of microarray profiling of 60 breast cancer samples (GEO accession GDS806 [35]) and found that the expression level PBOV1 significantly correlated to C/EBPa (Pearson correlation 0.48, p = 3N10 24 , 3 rd percentile in all PBOV1-correlated profiles, here and elsewhere without correction for multiple hypothesis testing). Additionally, we found a significant correlation between PBOV1 and C/EBPd expression levels in GOBO pooled breast cancer dataset [34] (Pearson correlation 0.14, p = 5N10 26 , 8 th percentile)and between PBOV1 and C/EBPc in Neve et al. [41] breast cancer cell line dataset (correlation 0.502, p = 5N10 28 , 2 nd percentile). We did not find a significant correlation between PBOV1 and C/EBP expression levels in the GDS1746 [37] prostate cancer dataset. These results suggest that various C/EBP transcription factors may positively contribute to the expression of PBOV1 in breast cancer.
It has been previously shown that PBOV1 expression in breast cancer and prostate cancer cells is positively regulated by estrogen [22] and dihydrotestosterone [21], respectively.
In an attempt to explain this, we searched the PBOV1 promoter region for the presence of estrogen response elements or androgen response elements but did not find any significant matches (data  [34] shows that higher levels of PBOV1 expression positively correlated to relapse-free survival in breast cancer. Among clinical subgroups the effect was mostly pronounced in cases of lymph node positive cancers and in cases of grade 2 tumors (data obtained from GOBO online tool [34]). B. PBOV1 expression levels in clinical samples of estrogen receptor-positive breast cancer positively correlate to the patient relapse-free survival over 5 years following tamoxifen therapy (data obtained from GEO dataset GDS806 [35]). Error bars represent standard error of the mean. C. PBOV1 expression levels in clinical tumor samples from proneural glioma patients positively correlate with survival over 209 weeks (data obtained from GEO dataset GDS1816 [36]). Error bars represent standard error of the mean. D. Primary proneural gliomas have significantly higher expression levels of PBOV1 expression than recurrent ones (data obtained from GEO dataset GDS1816 [36]). Error bars represent standard error of the mean. doi:10.1371/journal.pone.0056162.g004 PBOV1-a De Novo Gene Correlated to Cancer Outcome PLOS ONE | www.plosone.org not shown) suggesting that the influence of sex hormone receptors on PBOV1 expression could be mediated by other transcription factors. FOXA1 has a binding site in the PBOV1 promoter and could play the role of such mediator, since this transcription factor is able to directly recruit estrogen and androgen receptors [42]. However, we did not find a significant correlation of PBOV1 expression to FOXA1 or to estrogen receptor alpha (ESR1) levels in the breast cancer gene expression dataset GDS806. We also found insignificant correlations of PBOV1 to FOXA1 and androgen receptor genes in GDS1746 [37] prostate cancer dataset.
Finally, we found that PBOV1 expression in both GDS1746 [37] prostate and GDS806 [35] breast cancer datasets was highly correlated to the expression level of sonic hedgehog (SHH) (0.50, p = 0.002, 8.1 th percentile and 0.60, p = 2N10 27 , 1.0 st percentile, respectively), indicating that the Hedgehog pathway could be one of the drivers of PBOV1 activation in those cancer types. Interestingly, this regulation might be mediated by FOXA2 binding to the promoter region, since FOXA2 is a reported effector of Hedgehog signaling [43]. We found a very significant correlation of PBOV1 expression to FOXA2 expression levels in the GDS1746 [37] prostate cancer dataset (correlation 0.73, p = 2N10 25 , 0.2 th percentile), in GOBO pooled breast cancer dataset [34] (correlation 0.145, p = 2N10 27 , 7 th percentile, but no significant correlation was present in GDS806 breast cancer dataset. Although those results suggest an association between the activity of Hedgehog pathway and PBOV1 expression levels, the evidence is purely correlative. However, we found a microarray dataset deposited in GEO under GSE11981 accession that came from a study of gene expression response of human pancreas cancer xenografts in mice to treatment with HhAntag, a prospective Hedgehog-inhibiting anti-cancer drug [44]. In this dataset we found that in three out of four replicates PBOV1 expression went below 25% of the average of the control, while in the fourth it did not change (One-tailed T-test p = 0.034 over all samples, p = 0.004 with one outlier value removed, Figure 5). This finding suggests that the Hedgehog signaling pathway may significantly contribute to PBOV1 activation in pancreatic cancer cells.

Evolutionary history of PBOV1
Our comparative genetics analysis indicates that PBOV1 recently emerged de novo as a protein-coding gene. The current protein-coding sequence is not conserved and has appeared in a series of frame-shift and stop codon mutations. As a consequence, 80% of the protein is likely specific to human. However, with our analysis we cannot determine whether the orthologous genomic loci are transcriptionally active or encode unrelated proteins in other mammals.

Regulation of PBOV1 expression
The PCR experiments on cDNA panels and clinical tumor samples showed that PBOV1 was expressed in tumors of 19 distinct tissue origins, out of 34 tested, and at the same time was silent in all normal fetal and adult tissue types tested. These results are highly consistent with our previous report [20], and the fact that we used an independent batch of cDNA panels in this work shows that the obtained result is robust. Early reports indicated that PBOV1 was expressed in breast and prostate cancers and that its expression in tumor cells was upregulated by sex hormone treatment [21,22]. Consistent with this, we found that PBOV1 was expressed in multiple hormone-dependent cancer types, including breast, ovary, uterus, prostate and testis cancer.
The mechanism behind the tumor-specific activation of PBOV1 is unclear. Tumors are known for widespread transcriptional activation and this phenomenon has been at least partially attributed to DNA hypomethylation [15]. However, we found that PBOV1 was expressed from a GC-poor, TATA-containing promoter and its expression in HepG2 cells was insensitive to DNA methylation inhibitor treatment but responded to treatment with histone deacetylase inhibitor. These results suggest that, unlike cancer/testis antigens, PBOV1 activation in tumors cannot be explained by DNA hypomethylation and is likely a result of the action of specific transcription factors. Hence we conclude that PBOV1 can be classified to tumor-specific antigens (TSA), a class of genes postulated a long time ago [45], but the attempts to identify specific members have been mostly unproductive, with one notable exception being the alpha-fetoprotein [46].
Here we have further attempted to identify the transcription factors that could control the tumor-specific activation of PBOV1. Although our results are far from being conclusive, we have made a number of important observations. By analyzing publicly available ChIP-seq data and transcription profile correlations in microarray datasets, we found some evidence that PBOV1 expression in cancers may be positively regulated by C/EBP transcription factors and by Hedgehog signaling pathway. The latter result is especially interesting since the Hedgehog signaling pathway is one of the master regulators of embryonic development. While it is mostly quiescent in adult tissues, the ectopic reactivation of the Hedgehog pathway has been shown to be involved in the development of cancer [44]. Due to the pivotal role of Hedgehog signaling in many cancers, a number of Smoothened inhibitor drugs are currently undergoing clinical trials for anticancer efficacy [47]. We found publicly deposited microarray data that shows that PBOV1 expression in pancreatic cancer xenografts negatively responds to the treatment with HhAntag, one of the emerging anti-cancer Hedgehog inhibitor drugs. This result suggests that Hedgehog signaling might be one of the important factors that shape the tumor-specific expression of PBOV1. Figure 5. PBOV1 expression in pancreas cancer xenografts is downregulated by HhAntag treatment (data from GSE11981 dataset). The data comes from a study that profiled the gene expression response of human pancreatic cancer xenografts in mice to the treatment with HhAntag, a potent inhibitor of Hedgehog signaling and a prospective anti-cancer drug [44]. In three out of four replicates PBOV1 expression was downregulated by more than 75%. doi:10.1371/journal.pone.0056162.g005 However this finding requires further validation, which is a scope of our future work.

Possible functional role of PBOV1 protein
In our analysis of data from publicly available microarray experiments, we found that PBOV1 gene expression levels positively correlated with relapse-free survival in breast cancer patients and with overall longitude of survival in glioma patients. Based on this data, we hypothesize that PBOV1 protein may act as a tumor suppressor upon its expression in tumors. This hypothesis goes in line with a previous report that the missense SNPs in PBOV1 is associated with an increased risk of breast cancer [33]. Experimental testing of this hypothesis and the dissection of potential mechanisms of PBOV1 tumor-suppressor activity remains a scope for future investigations. However, we would like to speculate on one hypothetic possibility.
Since PBOV1 coding sequence has recently emerged de novo and since our analysis did not identify any functional features in the protein, it is unlikely that PBOV1 protein could act as a tumor suppressor by specifically interfering with some cellular mechanisms and pathways. Rather, we find it plausible that its hypothetic tumor suppressor function could stem directly from the highly tumor-specific expression profile. Various proteins that are expressed either specifically or preferentially in cancers have been shown in multiple instances to provoke an immune response against the cancer cells. Examples include cancer/testis antigens from CT-X, MAGE/BAGE/CAGE and PRAME gene families [48]. Cytotoxic immune response triggered by cancer antigens is an important mechanism of anti-tumor defense and has inspired many efforts to create anti-cancer therapeutic vaccines [49]. We hypothesize that PBOV1 expression in cancer cells may provoke an immune response against the tumor cells in a similar fashion and thus help the organism to fight the cancer.
Although we did not present any direct evidence supporting the tumor antigen and suppressor functions of PBOV1 protein, our hypothesis is to some extent supported by the observations from the glioma dataset, where we found that PBOV1 was expressed at significantly lower levels in recurrent proneural gliomas compared to the primary proneural gliomas. This could indicate the presence of immunoediting against PBOV1-expressing cells, which is a process where the immune system culls out the cancer cells that are highly expressing the tumor antigens and thus drives cancer development towards low immunogenicity [50].
If our hypothesis is correct and PBOV1 acts as an immunological tumor suppressor, this property of the gene might have provided an evolutionary advantage to the human ancestors that gained the PBOV1 coding sequence and thus could facilitate the fixation of its protein-coding sequence in its present form. A similar mechanism has been previously suggested to have played a role in the evolution of MAGE cancer-testis antigen family. MAGE type I genes have undergone a large evolutionary expansion in primates and encode proteins that are neutrally evolving and have unclear functions [51]. Despite this, some of MAGE-A family members have been specifically retained in the human genome, and it has been proposed that this fixation was facilitated by the beneficial role of MAGE-A as cancer antigens [52].
We hypothesize that such cancer-mediated immunological feedback mechanism could play a general role in the origin of various de novo genes. This is an attractive possibility because in order to function as a tumor-specific antigen, the sequence of the protein is not required to possess any specific functional features. The only requirement for the protein would be to serve as a source of peptides loaded on MHC Class I, which almost any sequence could fulfill. Then the cancer immunity feedback might drive the fixation of the de novo gene in the 'twilight zone'. Here, on one hand the cancer-mediated selective pressure would safeguard the gene from extinction and on the other hand there would be little constraints on the exact protein sequence, which could allow for rapid evolution and eventually facilitate the development of more specialized functions. This immunological feedback mechanism may aid novel gene fixation in all the animals that have an adaptive immune system, going as far as primitive vertebrates like hagfish and lamprey, which are both capable of an adaptive immune response and are also known to develop tumors [53].

Concluding Remarks
In this work we have found that PBOV1 was a human proteincoding gene that has recently originated de novo. The gene appeared to be expressed exclusively in tumors and its expression was associated with a positive clinical outcome in breast cancer and glioma. It has been previously reported that missense SNP in PBOV1 is correlated to an increased risk of breast cancer, and although this suggests that this positive association might be causal, the mechanism behind this association is currently unclear. We have hypothesized that PBOV1 could function by provoking an immune response against cancer cells that are expressing it, and that this property could facilitate the fixation of the PBOV1 coding sequence in the human evolutionary lineage. The validation of this hypothesis is a scope of future research.

Ethics Statement
In our work we performed gene expression studies using samples of surgically extracted tumors of various origins for cDNA production, as well as commercial cDNA panels from various human cancers as well as normal adult and fetal tissue samples. In all cases the experiments were conducted after an approval of the

cDNA Panels
For the expression studies we used commercial cDNA panels from Clontech (USA) and BioChain Institute (USA).

MTC TM Panels
The panels containing a set of normalized single-strand cDNA, produced from poly(A)+ RNA from various normal human tissues were obtained from Clontech, USA. We used the following panels:

RNA Purification
The total RNA was purified from tumor samples following the standard protocol involving guanidine isothiocyanate [54]. Purified RNA was treated with RNase-free DNase I (Sigma, USA). The samples were tested for DNA contamination using PCR with gDNA-CTR primers targeting an exon-intron junction of HERC1 gene.

cDNA Production
We synthesized cDNA using Revert AidH First Strand cDNA Synthesis Kit (Fermentas, Lithuania) using random hexamer primers, following the manufacturer guidelines. The obtained cDNA was stored at 220uC. The possible contamination of samples with genomic DNA (gDNA) was controlled using gDNA-CTR primers that were designed to cross an exon-intron junction of HERC1 gene. The primer sequences were: forward 59-AAGTGATCTGCC-CACTTTGG-39 59-GACACGCTGGAGTACAAGCA-39 The following PCR conditions were used 1 min -95uC; 30 cycles consisting of 30 s at 95uC, 30 s at 60uC, 1 min at 72uC; followed by the final elongation at 72uC for 5 min. The expected size for the gDNA-specific product was 537 bp.

PCR
All PCR products were analyzed by electrophoresis in 2% agarose gel and detected by staining with ethidium bromide. The results of electrophoresis are presented in the article as cropped images of gels. The full length images of gels are presented in the Supplementary File 1.

Search for orthologous sequences
We used a MULTIZ multiple alignment of 46 genomes produced UCSC Bioinformatics Group [23] and extracted the multiple alignment of human PBOV1 CDS with the genomes of 34 placental mammalian species (Table 1).

Mammalian Phylogenetic Tree
We used the mammalian phylogeny that was generated by UCSC Bioinformatics Group using PhyloFit software (http://hgdownload.cse. ucsc.edu/goldenPath/hg19/phastCons46way/placentalMammals. mod). The tree represents the species topology that was used by MULTIZ to generate the multiple genome alignments, and is consistent with currently accepted model for early placental mammalian radiation [55].
We used K-Estimator 6.0 [27] to estimate the substitution rates in a multiple sequence alignment of human PBOV1 CDS with orthologous regions in the genomes of chimp, orangutan, gorilla and rhesus. Confidence intervals were provided by the K-Estimator software on the basis of Monte Carlo simulations.

Codon Usage Bias Estimation
We estimated codon usage bias in the CDS using the method described by Guigó [28]. In brief, given a sequence of codons C = C 1 , C 2 , … C n and a table of codon frequencies F(C) in the protein coding sequences, codon usage score is a logarithm or ratio of two values, P(C) = F(C 1 )F(C 2 )… F(Cn) that is a product of frequencies of every codon in the sequence and P 0 (C) = F 0 (C 1 ) F 0 (C 2 )… F 0 (Cn), a product of expected frequencies of the same codons in a non-coding sequence, which for simplicity is set to a constant value of 1/64. Thus the codon usage score log(P(C))/ P 0 (C)) is a log-likelihood ratio of the observed codon sequence. In order to compute the codon usage score, we took the human nuclear DNA codon preference table from [57]. The significance of the obtained score was assessed by bootstrapping, as a frequency of getting the same or higher score from random sequences of the same length and nucleotide composition, computed on 10000 replications.

Supporting Information
File S1 Original gel images that were used in Figures 2 and 3. (PDF) We used K-Estimator 6.0 [27] to estimate the substitution rates in a multiple sequence alignment of human PBOV1 CDS with PBOV1-a De Novo Gene Correlated to Cancer Outcome