Information Retrieval for Fault Localization Using Latent Semantic Indexing (LSI) and Other Methods

Introduction

There are a number of approaches used for Fault Localization in potential bug files that use Information Retrieval (IR) methods. Common techniques are the BugLocator IR methods that utilize a ranking system based on direct and indirect linking of potential source file fixes. A well known technique such as BugLocator would be a relevant benchmark IR for comparison against Latent Semantic Indexing (LSI). By comparing evaluation metrics, we were able to analyze performance of these methods. The first approach was broken into two methods (methods 1 and 2) to facilitate a benchmark for the full implementation of BugLocator (method 2) and LSI (method 3).

All methods were trained and tested with the bug reports and source files of Java open source project packages. However, Python was used to pre-process the data, as well as create/train/test the models.

Overall there are three methods that were implemented and evaluated:

Method 1: Simplified BugLocator
Method 2: Full BugLocator
Method 3: Latent Semantic Indexing (LSI) with Singular Value Decomposition (SVD)

The pre-processing code up to the Markdown heading "More Pre-processing (Team 7)" in the Jupyter notebook was provided by a course instructor.

Overall, method 2 showed the best performance based on Mean Reciprocal Rank (MRR) and Mean Average Precision (MAP) evaluation metric values. Visualization for these results are shown in the screenshots section of this readme document.

Features

Pre-processes bug reports (query results) and source files (query results) to train machine learning algorithms..
Ranks source files (query results) related to a bug report (query) to find the location of bugs related to the bug report.
NumPy style documentation for maintainability and clarity of application.

Launch

Setup

To prepare a dataset for the application to process, follow the "Getting Started" instructions here.

You must use Python 3 to run our notebook once the data has been processed as instructed in the aforementioned "Getting Started" section.

To run application, first install Jupyter Lab, then open a new console and enter:

jupyter lab

This will open a jupyter lab tab in your default browser, in which you can run the application.

Screenshots

MRR Results (Mean Reciprocal Rank)

MAP Results (Mean Average Precision)

Technologies

Contributors


Nicolas Mora	Connor Britton	Philip Rea	Joseph Park

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
images		images
.gitignore		.gitignore
README.md		README.md
Team7-Project2_2_Final.ipynb		Team7-Project2_2_Final.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Information Retrieval for Fault Localization Using Latent Semantic Indexing (LSI) and Other Methods

Table of Contents

Introduction

Features