Published March 29, 2022 | Version v2
Software Open

artifact_detection - A tool for NLP tasks on textual bug reports.

  • 1. Graz University of Technology

Description

artifact_detection
A tool for NLP tasks on textual bug reports.
Automated classification of text into natural language (e.g. English in the contained datasets), and non-natural language text portions (e.g. stack traces, code snippets, log outputs, file listings, urls,) on a line by line basis.

This repo contains the Python implementation of a machine learning classifier model, basic scripts for automated trainingset creation from GitHub issue tickets.
Further, a scikit-learn transformer implementation wrapping pretrained models ready to be used as preprocessing step.
Datasets consist of issue tickets and documentation files mined from C++, Java, JavaScript, PHP, and Python projects hosted on GitHub.
Detailed discussion of this model can be found in "Detecting non-natural language artifacts for de-noising bug reports" - Hirsch T. and Hofer B. (in review).

This is project is also available on GitHub:
https://github.com/AmadeusBugProject/artifact_detection

Files

artifact_detection.zip

Files (538.9 MB)

Name Size Download all
md5:04ee0b9fbe70ea1b0c6f50f76a270f0b
538.9 MB Preview Download

Additional details

Funding

Automated Debugging in Use P 32653
FWF Austrian Science Fund