Predicting enhancers using a small subset of high confidence examples and co-training
- Published
- Accepted
- Subject Areas
- Computational Biology, Genomics
- Keywords
- enhancers, co-training, semi-supervised learning
- Copyright
- © 2016 Huska et al.
- Licence
- This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited.
- Cite this article
- 2016. Predicting enhancers using a small subset of high confidence examples and co-training. PeerJ Preprints 4:e2407v1 https://doi.org/10.7287/peerj.preprints.2407v1
Abstract
Enhancers are important regulatory regions located throughout the genome, primarily in non-coding regions. Several experimental methods have been developed over the last several years to identify their location, but the search space is large and the overlap between the putative enhancer identified using these methods tends to be very small. Computational methods for enhancer prediction often use one large set of experimentally identified enhancer regions as input, and therefore rely critically on their correctness. We chose to take a different approach, and start with a high confidence set of 21 enhancer that are in the intersection of enhancers identified using three completely unrelated experimental approaches: deepCAGE, HiCap and classical enhancer reporter assays. Because this starting set is so small, we use a semi-supervised approach called co-training rather than a fully supervised approach to progressively predict enhancers from unlabeled regions. Using this approach we are able to outperform supervised learning as well as simpler semi-supervised learning methods and achieve an average area under the ROC curve of 0.84.
Author Comment
This is an article which has been accepted for the "GCB 2016 Conference"