Published March 29, 2022
| Version v2
Software
Open
artifact_detection - A tool for NLP tasks on textual bug reports.
Description
artifact_detection A tool for NLP tasks on textual bug reports.
Automated classification of text into natural language (e.g. English in the contained datasets), and non-natural language text portions (e.g. stack traces, code snippets, log outputs, file listings, urls,) on a line by line basis.
This repo contains the Python implementation of a machine learning classifier model, basic scripts for automated trainingset creation from GitHub issue tickets. Further, a scikit-learn transformer implementation wrapping pretrained models ready to be used as preprocessing step. Datasets consist of issue tickets and documentation files mined from C++, Java, JavaScript, PHP, and Python projects hosted on GitHub.
Detailed discussion of this model can be found in "Detecting non-natural language artifacts for de-noising bug reports" - Hirsch T. and Hofer B. (in review). This is project is also available on GitHub: https://github.com/AmadeusBugProject/artifact_detection
Files
artifact_detection.zip
Files
(538.9 MB)
Name | Size | Download all |
---|---|---|
md5:04ee0b9fbe70ea1b0c6f50f76a270f0b
|
538.9 MB | Preview Download |
Additional details
Funding
- Automated Debugging in Use P 32653
- FWF Austrian Science Fund