Accurate prediction of functional effects for variants by combining gradient tree boosting with optimal neighborhood properties
Fig 1
(A) Feature representation. A total of 1521 sequence, Euclidean and Voronoi neighborhood features are initially generated. (B)Two-step feature selection. Stability selection is used as the first step. We select the top 152 features with score larger than 0.2. The second step is performed using a wrapper-based feature selection. Features are evaluated by 5-fold cross-validation with the GTB algorithm. (C) Prediction model. Gradient boosted trees are finally built for prediction.