Skip to content

Latest commit

 

History

History
51 lines (39 loc) · 2.08 KB

File metadata and controls

51 lines (39 loc) · 2.08 KB

Scripts

Introduction

This directory contains scripts that aide in accessing and manipulating the Project CodeNet dataset.

Summary

Each script has the usual -h or --help option to briefly explain its purpose and the possible and expected command-line arguments with their defaults.

project_codenet_submissions.sh: offers to select all submissions for a particular problem, language, status and code sizes. Generates a list of source code file names, one per line.

project_codenet_aggregate.sh: uses project_codenet_submissions.sh repeatedly to obtain all submissions for a set of problems, a set of languages, a set of statuses and code size range and act upon each source file by a user defined action (by default a symbolic link is created in a user selected output directory).

project_codenet.conf: is a sample configuration file for project_codenet_aggregate.sh. It specifies the set of problems, languages, statuses and minimum and maximum code sizes of interest and defines the action to take upon each submission source file.

dataset_verify.sh: checks the integrity of the dataset in both directions: whether all submissions mentioned in the metadata do indeed exist and reside in the expected location of the file system, and conversely, whether all source files are correctly covered by the metadata records.

post_fdupes.sh: postprocesses a file generated by the fdupes (or jdupes) utility program. Collects various statistics about the duplicates, like how many file sets there are, whether sets are of the same language, and whether there are duplicates in the Accepted submissions.

callgraph.sh: explores the call graph of a C, C++, or Java source file. By default starts from main and creates a JSON-Graph of all reachable functions.

callees.sh: expects a C, C++, or Java source file as input and extracts all function definitions reachable from a given start function name (by default main).

callgraph_aux.sh: shared auxiliary functionality for the other callgraph scripts. Uses srcml and xmlstarlet to explore the call graph of a C, C++, or Java source file.