Skip to content
/ PyMiner Public

This is a tool to mining and extract metrics from Python open source projects from github for an empirical research.

License

Notifications You must be signed in to change notification settings

PAMunb/PyMiner

Repository files navigation

PyMiner - Python Feature Counter

Python Feature Counter is a tool designed to analyze Python repositories and count occurrences of specific Python language features introduced in various versions. The application processes repositories from a given list and generates detailed CSV reports with the analysis results.


Features

  • Analyzes Git repositories for occurrences of modern Python features.
  • Supports multithreaded processing for better performance.
  • Filters commits by date range.
  • Outputs results in CSV format for easy reporting and visualization.

Requirements

  • Python: Version 3.12 or newer.
  • Git: Must be installed and accessible in the system's PATH.
  • Dependencies: Installable via pip.

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/feature-counter.git
    cd feature-counter
    ´´´
    
  2. Install dependencies:

    pip install -r requirements.txt
    ´´´
    

Directory Structure

PyMiner/ ├── visitors/ # Feature-specific visitor modules ├── results/ # Output directory for CSV files ├── main.py # Main script ├── feature_counter.py # Core processing logic ├── requirements.txt # Dependency file └── README.md # Documentation

Usage

  1. Prepare the Input CSV File

The application expects a CSV file with a single column named name, containing the list of repositories to analyze in the format /. Example:

name
owner1/repo1
owner2/repo2

Save the file as python-projects.csv or any name of your choice.

  1. Run the Application

Run the script with the path to your CSV file as a command-line argument:

python3 main.py python-projects.csv
´´´
3. Results

Processed results are saved in the results/ directory as CSV files, named <owner>_<repo>.csv. Each file includes:

 Repository details.
 Date range of commits analyzed.
 Count of specific Python feature occurrences.

## Configuration

The application can be customized directly in the script:

 start_date: Defines the earliest commit date to analyze. Default is 2012-01-01.
 max_threads: Sets the number of threads for parallel processing. Default is 4.
 steps: Specifies the number of days between commit analyses. Default is 30.

About

This is a tool to mining and extract metrics from Python open source projects from github for an empirical research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •