Computational characterization and identification of human polycystic ovary syndrome genes

Zhang, Xing-Zhong; Pang, Yan-Li; Wang, Xian; Li, Yan-Hui

doi:10.1038/s41598-018-31110-4

Download PDF

Article
Open access
Published: 28 August 2018

Computational characterization and identification of human polycystic ovary syndrome genes

Xing-Zhong Zhang¹,
Yan-Li Pang³,
Xian Wang¹ &
…
Yan-Hui Li²

Scientific Reports volume 8, Article number: 12949 (2018) Cite this article

2951 Accesses
21 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Human polycystic ovary syndrome (PCOS) is a highly heritable disease regulated by genetic and environmental factors. Identifying PCOS genes is time consuming and costly in wet-lab. Developing an algorithm to predict PCOS candidates will be helpful. In this study, for the first time, we systematically analyzed properties of human PCOS genes. Compared with genes not yet known to be involved in PCOS regulation, known PCOS genes display distinguishing characteristics: (i) they tend to be located at network center; (ii) they tend to interact with each other; (iii) they tend to enrich in certain biological processes. Based on these features, we developed a machine-learning algorithm to predict new PCOS genes. 233 PCOS candidates were predicted with a posterior probability >0.9. Evidence supporting 7 of the top 10 predictions has been found.

PCOSKBR2: a database of genes, diseases, pathways, and networks associated with polycystic ovary syndrome

Article Open access 07 September 2020

Colocalization analysis of polycystic ovary syndrome to identify potential disease-mediating genes and proteins

Article Open access 04 March 2021

Identification and immune features of cuproptosis-related molecular clusters in polycystic ovary syndrome

Article Open access 18 January 2023

Introduction

Polycystic ovary syndrome (PCOS) is a highly complex disorder that affects 6–10% of women of reproductive age¹. It is a major cause of anovulatory infertility and increases the risk for insulin resistance, obesity, cardiovascular disease and psychosocial disorders^2,3. Studies have shown that PCOS is regulated by the subtle interaction of genes and environmental factors^4,5,6.

To identify PCOS genes, reverse genetics like microarray studies have profiled whole-genome gene expression in a number of PCOS tissues, including ovary^7,8 and adipose⁹. Genome-wide association study (GWAS) is used to identify regions of the genome that harbor variants associated with disease risk or quantitative traits^10,11,12. For computational methods, a group once reconstructed transcription factor-microRNA synergistic regulatory network, and they considered the nodes with highest degree as PCOS candidate genes¹³. Another group constructed a protein-protein interaction (PPI) subnetwork and selected the top hubs (both high degree and betweenness) as PCOS candidates¹⁴. However, both works lack rigorous statistics to evaluate the accuracy of the prediction. To our knowledge, no efficient algorithm has been developed to predict PCOS genes. In fact, bioinformatics algorithms have been successfully developed to infer candidate genes in other fields^15,16,17,18, and these could be introduced to PCOS research.

In this work, we developed a method to identify distinguishing properties of PCOS genes and subsequently used them to predict new candidates. We firstly systematically compared the computational characteristics of two groups of genes: known PCOS genes versus the remaining genes in the genome (called non-PCOS genes hereafter). We examined each set of the genes in network topological features and functional annotations. Then, we singled out the features with significant difference between PCOS and non-PCOS genes by Kolmogorov–Smirnov (KS) test. We employed support vector machine (SVM) with liner function as the classifier. Finally, with a posterior probability >0.9, 233 new PCOS genes were predicted. Literature supporting 7 of the top 10 predictions has been found.

Results

PCOS genes tend to have higher degrees

For a protein, its degree is defined as the number of direct interaction genes. According to network theory, a protein with more direct interaction neighbors (higher degree) might be more important to the network¹⁹. Based on PPI network downloaded from OPHID²⁰, we counted the number of direct interaction neighbors for each gene, and found that PCOS genes tend to have higher degrees than non-PCOS genes. The average degrees for PCOS genes and non-PCOS genes are 41.81 and 22.48, respectively (see Table 1). The cumulative frequency distribution curves of degrees for PCOS genes shift to the right compared with that of non-PCOS genes (Fig. 1A). There is significant difference between them, with P = 4.2E-13 by KS test.

Table 1 Network Characteristics of PCOS Genes.

Full size table

To validate the results, we also analyzed degrees of PCOSDB genes or PCOSKB genes separately. The average degree of PCOSDB genes is 50.54 while that from PCOSKB equals to 38.60, which are all significantly higher than that of non-PCOS genes. These results were listed in Table 1.

PCOS genes tend to enrich at global network center

Genes with high degrees might locate at globally or locally central position, while only those located globally are more likely to be evolutionarily conserved²¹. To distinguish the different locations, we calculated K-core for each gene. The K-core gradually displays the backbone of a network by iteratively deleting genes with a degree lower than K, remaining genes in the subnetwork with a degree higher than K. If a gene has a high K-core, then it is more likely to be located at global center. We found that PCOS genes have an average K-core of 16.78, whereas the average K-core of non-PCOS genes is only 11.66 (Table 1). KS test showed that there is significant difference, with P = 1.7E-10. The cumulative frequency distribution of K-core values for PCOS genes and non-PCOS genes are shown in Fig. 1B. And from the other two datasets, we obtained similar results, indicating that the high K-core of PCOS genes is data independent (Table 1).

Betweenness is another frequently used measure of network centrality, which counts the number of shortest paths between two other genes that go through a gene of interest. Therefore, a gene with a high betweenness could be considered as a bottleneck node in the network²². The results showed that PCOS genes had significantly higher connectivity along the shortest path between two genes than that of non-PCOS genes, with the average betweennesses are 34,463 and 17,278 (P = 1.4E-17 by KS test; Table 1 and Fig. 1C). And as shown in Table 1, PCOS genes have similar average betweenness from the other two datasets.

PCOS genes tend to interact with each other

Genes function through interaction with each other in signaling pathways, therefore, we reason that direct interaction neighbors of PCOS genes might also tend to be PCOS genes. To test this, for each gene, we calculated the 1st PCOS ratio, which is defined as a ratio of the number of PCOS genes that it directly interacts to its degree. For example, IGF1 (P05019) and IGF2 (P01344) have 16 and 21 direct interaction genes, respectively, and 9 and 12 of which are PCOS genes. The 1st PCOS ratio for IGF1 and IGF2 are 0.5625 = 9/16 and 0.5714 = 12/21. The cumulative frequency distribution of 1st PCOS ratios for PCOS genes and non-PCOS genes are shown in Fig. 1D. As shown in Table 1, the PCOS genes have an average 1st PCOS ratio of 0.11, which is significant higher than 0.04 for that of non-PCOS genes (P = 3.0E-48; KS test).

Meanwhile, for each gene, we also calculated the 2nd PCOS ratio, which is defined as the number of PCOS genes that belong to its 2-step interaction genes divided by the number of all its 2-step interaction genes. We found that PCOS genes have an average 2nd PCOS ratio of 0.04, which is significantly higher than 0.03, the value for that of non-PCOS genes, P = 6.0E-20 by KS test. There results could be found in Table 1 and Fig. 1E.

GO functional enrichment

As reported, genes associated with the same disease are often functionally related^18,23. To examine whether PCOS genes tend to take part in some biological processes, a log-odds score was computed for each GO term to compare the frequency at which PCOS genes and non-PCOS genes were annotated to it. The distributions of log-odds scores have a significant difference between PCOS genes and non-PCOS genes (P = 2.1E-66; KS test), indicating that PCOS genes tend to enrich in some biological processes.

No ovulation is a major diagnostic criterion for PCOS². As shown in Supplementary Table S1, “GO:0022602 ovulation cycle process” is enriched with PCOS genes. Steroid hormone plays an important role in ovarian development and ovulation process. Consistently, “GO:0042446 hormone biosynthetic process” is enriched. Meanwhile, “GO:0045940 positive regulation of steroid metabolic process” is significantly enriched with a log-odds score of 3.57, because steroid is a precursor for steroid hormone. These results indicate dysregulation of steroid hormone might be one major cause of PCOS. PCOS is a complex metabolic disease, and insulin resistance is another etiology¹. Thus, GO terms associated with regulation of plasma glucose are enriched, such as “GO:0048009 insulin-like growth factor receptor signaling pathway” and “GO:0010828 positive regulation of glucose transport”.

The performance of the classifier

To model all above features, we tested different classifiers, K-nearest neighbor (KNN), decision tree and SVM with different kernel functions. As described in the Materials and methods, 306 PCOS genes downloaded from PCOSDB and PCOSKB were used as positive samples, and 306 negative genes were randomly sampled from the non-PCOS genes. Since random sampling might introduce bias, we sampled 1001 negative datasets and combined each negative dataset with the positive dataset to train the classifier. The median value of 1001 results was used to evaluate the performance of different classifiers. As shown in the Table 2, we found SVM (liner) achieved the best performance, with precision = 0.81, recall = 0.71, F1 = 0.75, and AUC = 0.80. Thus, it was chosen as the final classifier and used for real application. To show the variances introduced by sampling, the boxplots of the 1001 training results (precisions, recalls, F1s, and AUCs) with SVM (liner) are given in Supplementary Figure S1. The ROC curves of SVM (liner) could be found in Fig. 2.

Table 2 The Classification Performance of Different Classifiers.

Full size table

Besides cross-validation, we also tested the classifier with independent datasets. Firstly, we used the 226 PCOSKB genes as positive samples and took the 185 PCOSDB genes as test samples. We randomly selected 226 genes from non-PCOS ones as negative samples to train the classifier. After repeating 1001 times, we employed the model with median AUC value to predict the PCOSDB genes. Of the 79 PCOSDB genes (excluding the ones in the PCOSKB), 53, 32 and 15 genes were recalled with a posterior probability higher than 0.5, 0.8 and 0.9, respectively. This also showed the algorithm is helpful.

Real application

To predict new PCOS candidate genes, we took the 306 PCOS genes from PCOSDB and PCOSKB as positive samples. And the genes in the dataset that got median AUC value of the 1001 randomly selected datasets were taken as negative samples. After training the classifier, we found that 13,681 unknown genes could be predicted by the algorithm. With a posterior probability higher than 0.9, 233 genes were predicted as PCOS genes (Supplementary Table S2). The top 25 genes are listed in Table 3.

Table 3 Top 25 Predicted PCOS Genes.

Full size table

To validate our predictions, we searched literature in PubMed and found evidence for 7, 10 and 14 of the top 10, 20 and 50 genes, respectively. For example, As shown in Supplementary Table S3, CTNNB1 is predicted as PCOS gene with a posterior probability = 0.9993. A significant reduction of the expression of CTNNB1 was reported in granulosa cells from patients with PCOS compared with control group²⁴. For another example, SMAD3 is predicted as PCOS gene with a posterior probability = 0.9979. Allele rs11031006-G in SMAD3 was reported to be associated with lower PCOS risk²⁵.

Discussion

In this work, we systematically investigated properties of PCOS genes and then developed an algorithm to predict new PCOS genes by integrating network characteristics and GO functional characteristics. Different from GWAS and other genetic methods, this work opens a new avenue to infer PCOS candidates.

Previously, two methods have been reported to infer PCOS genes^13,14. One used degree as feature¹³ and the other used both degree and betweenness as features¹⁴. In this work, besides degree and betweenness, we considered more network topological features like K-core, 1st PCOS ratio and 2nd PCOS ratio. And we integrated GO functional annotations to the algorithm. More important, our method is a supervised machine-learning algorithm, with rigorous statistics to evaluate the performance. And each predicted gene has a probability to evaluate the reliability of the prediction.

According to PCOSDB and PCOSKB, both IGF1 and IGF2 are PCOS genes. And in the PPI network, we found that most of their direct interaction neighbours are also PCOS genes. In addition, gene set enrichment analysis showed that IGF receptor signalling pathway (GO:0048009) is statistically enriched by PCOS genes, in which 18 of the 36 annotated genes are PCOS genes. And 12 of the rest 18 genes were predicted as PCOS candidates by our algorithm. These results are consistent with recent researches that IGF signalling pathways might play an important role in PCOS regulation^26,27,28.

It is known that PCOS is a highly heritable (70%) disease²⁹. However, to date, only one gene named PCOS1 has been collected to online Mendelian inheritance in man database³⁰. The PCOS genes analyzed in this work are downloaded from PCOSDB or PCOSKB. They are in fact PCOS-causing genes or PCOS-associated genes, since the causal relationships might need further confirmation by physiological studies. Here, we called them PCOS genes on one hand for short, on the other hand to highlight the importance of genetic background.

In this work, we defined 306 PCOS genes as positive samples and sampled 306 negative samples from the rest genes (13,681 = 13,987−306). We trained SVM model and evaluated classification performance with an equal number of positives and negatives, which has been widely adopted in previous studies^16,17,23. However, as mentioned by Myers et al.³¹, we should carefully interpret the results based on this method, because it is achieved under the assumption that the number of positives to the number of negatives equals to 1:1 in real application domain. Notably, it would also be improper if all remaining genes were selected as negatives. Because there might be not-yet-identified PCOS genes in negative samples, which might seriously underestimate the classifier.

Notably, current PPI data is far from perfect. They usually contain a number of false positive interactions and even more false negatives. Thus, some limitations are inevitable. For example, the degree of a protein might be related to the number of researches on it. And K-core, 1st and 2nd PCOS ratios might be indirectly related to such research bias. We think, with the improvement of PPI data quality, these problems could be solved and our approach could be more effective.

Materials and Methods

Data source

The PPI data were downloaded from the Online Predicted Human Interaction Database (OPHID; http://ophid.utoronto.ca/ophidv2.204/)²⁰. After deleting self-interactions and redundant interactions, the final PPI network covers a total of 16,982 proteins and 193,949 edges. Two lists of PCOS genes were downloaded from the Polycystic Ovary Syndrome Database (PCOSDB; http://www.pcosdb.net/)³² and the KnowledgeBase on Polycystic Ovary Syndrome (PCOSKB; http://pcoskb.bicnirrh.res.in)³³, with 208 and 241 genes, respectively. 185 of the 208 PCOSDB genes, and 226 of the 241 PCOSKB genes were covered by the OPHID network. We combined the PCOSDB genes and the PCOSKB genes and got 306 PCOS genes in total. The functional annotations of gene products were obtained from gene ontology (GO) http://www. geneontology.org³⁴. The source codes could be downloaded from Github: https://github.com/Heyuanshan/PCOS-genes-prediction.git.

Network topological features

The network features analyzed in this work, i.e., degree, K-core, betweenness and PCOS ratios (1st and 2nd), are defined in Table 4. They were computed by an R package, igraph³⁵.

Table 4 Formal Representation of Graph Measures.

Full size table

Log-odds score

We defined the log-odds score to describe the relative frequency with which a GO biological process was used to annotate PCOS or non-PCOS genes. The formula for calculation is as follows:

$$\mathrm{Log}\,-{\rm{odds}}-{\rm{score}}=\,\mathrm{log}(\frac{(m+a)/(n+a)}{{m}_{0}/{n}_{0}})$$

m₀ is the number of PCOS genes; n₀ is the total number of genes in human genome; m is the number of PCOS genes annotated to a GO term; and n is the total number of human genes annotated to the GO term. a (a = 1) is a correction factor. To avoid bias, we only used the GO terms annotated with more than 5 genes (n ≥ 5). If a gene annotated to a GO term with a high log-odds score, then the gene is more likely a PCOS gene. If a gene is annotated to several GO terms, the log-odds scores of these GO terms were summed to reflect its total associations to PCOS.

Kolmogorov–Smirnov test

The Kolmogorov-Smirnov test is a useful nonparametric statistical method for comparing two samples through quantifying a distance between the empirical distribution functions of them. In this work, we used two sample KS test to compare the network features and functional annotations between PCOS genes and non-PCOS genes.

Classifiers

We tested different classifiers to predict PCOS genes: K-nearest neighbor (KNN), decision tree and SVM with different kernel functions. KNN and decision tree were employed from MATLAB, and SVM were employed from LIBSVM3.22³⁶. As shown in the Results, SVM with linear kernel achieved the best performance. The parameter c was optimized and set at 1. For each predicted gene, LIBSVM gives a posterior probability to evaluate its reliability³⁷. If a gene gets a larger posterior probability, then it is more likely a PCOS gene.

Positive and negative samples

The 306 PCOS genes obtained from PCOSDB and PCOSKB were used as positive samples. We randomly selected 306 genes from the rest of the human genome as the negative samples. This method has frequently been used to predict disease genes^16,17,23. To avoid sampling bias, we sampled 1001 times of the negative datasets of 306 genes, and combined each negative dataset with the positive dataset to train the classifier.

Classifier evaluation

As in previous works¹⁸, we used 5-fold cross-validation to evaluate the classifier, in which 20 percent of the whole data were left out as the test data and the remaining (80 percent) as the training data. Precision, recall, F1 score, and area under curve (AUC) were used as the measures to evaluate the classification performance. For each test dataset, we counted the numbers of true positives (TP), false negatives (FN), true negatives (TN) and false positives (FP). The formulas for calculating precision, recall, and F1 score were as following:

$$Precision=\frac{TP}{TP+FP},Recall=\frac{TP}{TP+FN},F1=\frac{2\ast Precision\ast Recall}{Precision+Recall},$$

Because we sampled 1001 negative datasets, and combined each negative dataset with the positive dataset to train the classifier, we got 1001 training results. We used the median of the 1001 values of precisions, recalls, F1s, and AUCs as the final results.

References

McCartney, C. R. & Marshall, J. C. Clinical Practice. Polycystic Ovary Syndrome. The New England journal of medicine 375, 54–64, https://doi.org/10.1056/NEJMcp1514916 (2016).
Article PubMed PubMed Central Google Scholar
Dumesic, D. A. et al. Scientific Statement on the Diagnostic Criteria, Epidemiology, Pathophysiology, and Molecular Genetics of Polycystic Ovary Syndrome. Endocrine Reviews 36, 487–525, https://doi.org/10.1210/er.2015-1018 (2015).
Article PubMed PubMed Central CAS Google Scholar
Franks, S., Stark, J. & Hardy, K. Follicle dynamics and anovulation in polycystic ovary syndrome. Human reproduction update 14, 367–378, https://doi.org/10.1093/humupd/dmn015 (2008).
Article PubMed CAS Google Scholar
Kahsar-Miller, M. D., Nixon, C., Boots, L. R., Go, R. C. & Azziz, R. Prevalence of polycystic ovary syndrome (PCOS) in first-degree relatives of patients with PCOS. Fertility and sterility 75, 53–58 (2001).
Article PubMed CAS Google Scholar
Vink, J. M., Sadrzadeh, S., Lambalk, C. B. & Boomsma, D. I. Heritability of polycystic ovary syndrome in a Dutch twin-family study. The Journal of clinical endocrinology and metabolism 91, 2100–2104, https://doi.org/10.1210/jc.2005-1494 (2006).
Article PubMed CAS Google Scholar
Sam, S., Legro, R. S., Essah, P. A., Apridonidze, T. & Dunaif, A. Evidence for metabolic and reproductive phenotypes in mothers of women with polycystic ovary syndrome. Proceedings of the National Academy of Sciences of the United States of America 103, 7030–7035, https://doi.org/10.1073/pnas.0602025103 (2006).
Article ADS PubMed PubMed Central CAS Google Scholar
Jansen, E. et al. Abnormal gene expression profiles in human ovaries from polycystic ovary syndrome patients. Molecular endocrinology (Baltimore, Md.) 18, 3050–3063, https://doi.org/10.1210/me.2004-0074 (2004).
Article CAS Google Scholar
Wood, J. R. et al. Valproate-induced alterations in human theca cell gene expression: clues to the association between valproate use and metabolic side effects. Physiological genomics 20, 233–243, https://doi.org/10.1152/physiolgenomics.00193.2004 (2005).
Article PubMed CAS Google Scholar
Corton, M. et al. Differential gene expression profile in omental adipose tissue in women with polycystic ovary syndrome. The Journal of clinical endocrinology and metabolism 92, 328–337, https://doi.org/10.1210/jc.2006-1665 (2007).
Article PubMed CAS Google Scholar
Azziz, R. PCOS in 2015: New insights into the genetics of polycystic ovary syndrome. Nature reviews. Endocrinology 12, 74–75, https://doi.org/10.1038/nrendo.2015.230 (2016).
Article PubMed CAS Google Scholar
Jones, M. R. & Goodarzi, M. O. Genetic determinants of polycystic ovary syndrome: progress and future directions. Fertility and sterility 106, 25–32, https://doi.org/10.1016/j.fertnstert.2016.04.040 (2016).
Article PubMed Google Scholar
Liu, H., Zhao, H. & Chen, Z. J. Genome-Wide Association Studies for Polycystic Ovary Syndrome. Seminars in reproductive medicine 34, 224–229, https://doi.org/10.1055/s-0036-1585403 (2016).
Article PubMed Google Scholar
Liu, H. Y., Huang, Y. L., Liu, J. Q. & Huang, Q. Transcription factormicroRNA synergistic regulatory network revealing the mechanism of polycystic ovary syndrome. Molecular medicine reports 13, 3920–3928, https://doi.org/10.3892/mmr.2016.5019 (2016).
Article PubMed PubMed Central CAS Google Scholar
Kori, M., Gov, E. & Arga, K. Y. Molecular signatures of ovarian diseases: Insights from network medicine perspective. Systems biology in reproductive medicine 62, 266–282, https://doi.org/10.1080/19396368.2016.1197982 (2016).
Article PubMed CAS Google Scholar
Li, Y. H., Dong, M. Q. & Guo, Z. Systematic analysis and prediction of longevity genes in Caenorhabditis elegans. Mechanisms of ageing and development 131, 700–709, https://doi.org/10.1016/j.mad.2010.10.001 (2010).
Article PubMed CAS Google Scholar
Xu, J. & Li, Y. Discovering disease-genes by topological features in human protein-protein interaction network. Bioinformatics (Oxford, England) 22, 2800–2805, https://doi.org/10.1093/bioinformatics/btl467 (2006).
Article CAS Google Scholar
Lopez-Bigas, N. & Ouzounis, C. A. Genome-wide identification of genes likely to be involved in human genetic disease. Nucleic acids research 32, 3108–3114, https://doi.org/10.1093/nar/gkh605 (2004).
Article PubMed PubMed Central CAS Google Scholar
Li, Y. H., Zhang, G. G. & Wang, N. Systematic Characterization and Prediction of Human Hypertension Genes. Hypertension (Dallas, Tex.: 1979) 69, 349–355, https://doi.org/10.1161/hypertensionaha.116.08573 (2017).
Article CAS Google Scholar
Jeong, H., Mason, S. & Barabasi, A. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).
Article ADS PubMed CAS Google Scholar
Brown, K. R. & Jurisica, I. Online predicted human interaction database. Bioinformatics 21, 2076–2082, https://doi.org/10.1093/bioinformatics/bti273 (2005).
Article PubMed CAS Google Scholar
Wuchty, S. & Almaas, E. Peeling the yeast protein network. Proteomics 5, 444–449, https://doi.org/10.1002/pmic.200400962 (2005).
Article PubMed CAS Google Scholar
Yoon, J., Blumer, A. & Lee, K. An algorithm for modularity analysis of directed and weighted biological networks based on edge-betweenness centrality. Bioinformatics (Oxford, England) 22, 3106–3108, https://doi.org/10.1093/bioinformatics/btl533 (2006).
Article CAS Google Scholar
Furney, S. J., Higgins, D. G., Ouzounis, C. A. & Lopez-Bigas, N. Structural and functional properties of genes involved in human cancer. BMC genomics 7, 3, https://doi.org/10.1186/1471-2164-7-3 (2006).
Article PubMed PubMed Central CAS Google Scholar
Wu, X. Q. et al. The WNT/beta-catenin signaling pathway may be involved in granulosa cell apoptosis from patients with PCOS in North China. Journal of gynecology obstetrics and human reproduction 46, 93–99, https://doi.org/10.1016/j.jgyn.2015.08.013 (2017).
Article PubMed Google Scholar
Mbarek, H. et al. Identification of Common Genetic Variants Influencing Spontaneous Dizygotic Twinning and Female Fertility. American journal of human genetics 98, 898–908, https://doi.org/10.1016/j.ajhg.2016.03.008 (2016).
Article PubMed PubMed Central CAS Google Scholar
Denner, L., Bodenburg, Y. H., Jiang, J., Pages, G. & Urban, R. J. Insulin-like growth factor-I activates extracellularly regulated kinase to regulate the p450 side-chain cleavage insulin-like response element in granulosa cells. Endocrinology 151, 2819–2825, https://doi.org/10.1210/en.2009-1439 (2010).
Article PubMed PubMed Central CAS Google Scholar
Ozerkan, K., Uncu, G. & Tufekci, M. Insulin-like growth factor-1 and insulin-like growth factor-binding protein-1 in patients with polycystic ovary syndrome during clomiphene citrate therapy. International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics 108, 71–72, https://doi.org/10.1016/j.ijgo.2009.08.016 (2010).
Article CAS Google Scholar
Zhong, G. & Chen, B. Serum and follicular fluid levels of IGF-II, IGF-binding protein-4 and pregnancy-associated plasma protein-A in controlled ovarian hyperstimulation cycle between polycystic ovarian syndrome (PCOS) and non-PCOS women. Gynecological endocrinology: the official journal of the International Society of Gynecological Endocrinology 27, 86–90, https://doi.org/10.3109/09513590.2010.490608 (2011).
Article MathSciNet CAS Google Scholar
Welt, C. K. & Duran, J. M. Genetics of polycystic ovary syndrome. Seminars in reproductive medicine 32, 177–182, https://doi.org/10.1055/s-0034-1371089 (2014).
Article PubMed PubMed Central CAS Google Scholar
Urbanek, M. et al. Candidate gene region for polycystic ovary syndrome on chromosome 19p13.2. The Journal of clinical endocrinology and metabolism 90, 6623–6629, https://doi.org/10.1210/jc.2005-0622 (2005).
Article PubMed CAS Google Scholar
Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC genomics 7, 187, https://doi.org/10.1186/1471-2164-7-187 (2006).
Article PubMed PubMed Central CAS Google Scholar
Jesintha Mary, M., Vetrivel, U., Munuswamy, D. & Melanathuru, V. PCOSDB: PolyCystic Ovary Syndrome Database for manually curated disease associated genes. Bioinformation 12, 4–8, https://doi.org/10.6026/97320630012004 (2016).
Article PubMed PubMed Central Google Scholar
Joseph, S., Barai, R. S., Bhujbalrao, R. & Idicula-Thomas, S. PCOSKB: A KnowledgeBase on genes, diseases, ontology terms and biochemical pathways associated with PolyCystic Ovary Syndrome. Nucleic acids research 44, D1032–1035, https://doi.org/10.1093/nar/gkv1146 (2016).
Article PubMed CAS Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 25, 25–29, https://doi.org/10.1038/75556 (2000).
Article PubMed PubMed Central CAS Google Scholar
Csardi, G. & Nepusz, T. The Igraph Software Package for Complex NetworkResearch. Inter J Complex Sys complex systems (2006).
Chih-Chung, C. LIBSVM: a library for support vector machines. ACM Trans. Intelligent Systems and Technology 2(27), 21–27: 27 (2011).
ADS Google Scholar
Kwok, T. Y. Moderating the outputs of support vector machine classifiers (IEEE Press, 1999).

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (grant no: 81300253 to Dr. Yan-Hui Li).

Author information

Authors and Affiliations

Department of Physiology and Pathophysiology, School of Basic Medical Sciences, Peking University, Beijing, China
Xing-Zhong Zhang & Xian Wang
Institute of Cardiovascular Sciences, Peking University, Beijing, China
Yan-Hui Li
Department of Obstetrics and Gynecology, Center for Reproductive Medicine, Peking University Third Hospital, Beijing, China
Yan-Li Pang

Authors

Xing-Zhong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Li Pang
View author publications
You can also search for this author in PubMed Google Scholar
Xian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan-Hui Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.-Z.Z. and Y.-H.L. conceived and designed the experiments. X.-Z.Z., Y.-L.P. and Y.-H.L. interpreted the results. X.-Z.Z. performed the experiments and wrote the manuscript. Y.-H.L. supervised the project. Y.-H.L. and X.W. revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xian Wang or Yan-Hui Li.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary information

Dataset 1

Dataset 2

Dataset 3

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, XZ., Pang, YL., Wang, X. et al. Computational characterization and identification of human polycystic ovary syndrome genes. Sci Rep 8, 12949 (2018). https://doi.org/10.1038/s41598-018-31110-4

Download citation

Received: 21 May 2018
Accepted: 10 August 2018
Published: 28 August 2018
DOI: https://doi.org/10.1038/s41598-018-31110-4

This article is cited by

Progress of the application clinical prediction model in polycystic ovary syndrome
- Guan Guixue
- Pu Yifu
- Yang Wen
Journal of Ovarian Research (2023)
Identification of novel candidate biomarkers and immune infiltration in polycystic ovary syndrome
- Zhijing Na
- Wen Guo
- Da Li
Journal of Ovarian Research (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

PCOS genes tend to have higher degrees

PCOS genes tend to enrich at global network center

PCOS genes tend to interact with each other

GO functional enrichment

The performance of the classifier

Real application

Discussion

Materials and Methods

Data source

Network topological features

Log-odds score

Kolmogorov–Smirnov test

Classifiers

Positive and negative samples

Classifier evaluation

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links