Published May 9, 2023
| Version v1
Software
Open
LoGoFunc
Creators
Description
# LoGoFunc This repository contains code to train LoGoFunc as well as the trained LoGoFunc model along with the data and code to generate the GOF, LOF, and neutral predictions for missense variants genome-wide. The trained LoGoFunc models are included in the models directory. Training and testing data along with functional labels are included in the data directory. # Requirements To reproduce the LoGoFunc predictions, you will need the following python libraries: - pandas - joblib - lightgbm - scikit-learn - imbalanced-learn # Setup We recommend the use of a conda environment for the management of python dependencies. You may prepare and run LoGoFunc in a conda environment using the following steps assuming you have conda installed and added to your path. 1. `conda create -n logofunc python=3 pandas=1.5.0 joblib lightgbm=3.2.1 scikit-learn=1.1.2 imbalanced-learn=0.8.0` 2. `conda activate logofunc` 3. `git clone https://gitlab.com/itan-lab/logofunc.git` 4. `cd logofunc` # Testing The test.py script may be used generate predictions for the test variants from the LoGoFunc manuscript using the pretrained LoGoFunc model as follows. 1. `python ./scripts/test.py predictions.csv` Predictions for the testing variants will be included in the predictions.csv file. # Training models LoGoFunc may be trained from scratch as follows. 1. `python ./scripts/train.py output_dir` Trained models and the fit data preprocessor will be stored in `output_dir`. To generate predictions with the fit models, you may edit the paths in the `test.py` file to load the models and preprocessor from `output_dir`. # Generating predictions for all missense variants To reproduce LoGoFunc's GOF, LOF, and neutral predictions for missense variants, you must download the annotated missense variants which are available here: https://zenodo.org/record/7562029/files/data.csv.gz?download=1 1. `wget -O data.csv.gz "https://zenodo.org/record/7562029/files/data.csv.gz?download=1"` 2. `python ./scripts/predict.py -f data.csv.gz -m ./models -o output.csv` Due to the large set of annotations employed by LoGoFunc, most users will not be able to load the entire missense variant file into memory. As a result, missense variants are read and predicted 10,000 at a time. This may be adjusted by passing the '-s' flag and an integer of your choosing to the predict script (see `python ./scripts/predict.py` -h for more details). The output file will contain the LoGoFunc neutral, GOF, and LOF predictions.
Files
logofunc.zip
Files
(144.4 MB)
Name | Size | Download all |
---|---|---|
md5:0c06f1888774e9338f767a8ec39b8aaa
|
144.4 MB | Preview Download |