diff --git a/.gitignore b/.gitignore index 894a44c..43681bc 100644 --- a/.gitignore +++ b/.gitignore @@ -102,3 +102,5 @@ venv.bak/ # mypy .mypy_cache/ + +.vscode \ No newline at end of file diff --git a/README.md b/README.md index 3ef6327..fee6526 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ # True Git Code Churn + ![GitHub release version](https://img.shields.io/github/v/release/flacle/truegitcodechurn.svg?sort=semver) A Python script to compute "true" code churn of a Git repository. Useful for software teams to openly help manage technical debt. @@ -13,28 +14,36 @@ Solutions that I've found online looked at changes to files irrespective whether *Tested with Python version 3.5.3 and Git version 2.20.1* -# How it works +## How it works + This lightweight script looks at commits per author for a given date range on the **current branch**. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn. Result is a print with aggregated contribution and churn per author for a given period in time. ***Note:*** This includes the `--no-merges` flag as it assumes that merge commits with or without merge conflicts are not indicative of churn. -# Usage -Positional (required) arguments: +## Usage + +### Positional (required) arguments + - **after**        after a certain date, in YYYY[-MM[-DD]] format - **before**     before a certain date, in YYYY[-MM[-DD]] format - **author**     author string (not a committer), leave blank to scope all authors - **dir**            include Git repository directory -Optional arguments: +### Optional arguments + - **-h, --h, --help**    show this help message and exit - **-exdir**                   exclude Git repository subdirectory +-- **--show-file-data** display line count changes per file ## Usage Example 1 + ```bash python ./gitcodechurn.py after="2018-11-29" before="2019-03-01" author="an author" dir="/Users/myname/myrepo" -exdir="excluded-directory" ``` + ## Output 1 + ```bash author: an author contribution: 844 @@ -42,15 +51,96 @@ churn: -28 ``` ## Usage Example 2 + ```bash python ./gitcodechurn.py after="2018-11-29" before="2019-03-01" author="" dir="/Users/myname/myrepo" -exdir="excluded-directory" ``` + ## Output 2 + ```bash authors: author1, author2, author3 contribution: 4423 churn: -543 ``` +## Usage Example 3 + +```bash +python ./gitcodechurn.py after="2018-11-29" before="2021-11-05" author="flacle" dir="/Users/myname/myrepo" --show-file-data +``` + +## Output 3 + +```bash +author: flacle +contribution: 337 +churn: -19 +------------------------------------------------------------------------------- + FILE NAME | LINE # | ADDED | REMOVED +------------------------------------------------------------------------------- + gitcodechurn.py | 1 | 190 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 2 | 4 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 37 | 2 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 40 | 0 | 1 +------------------------------------------------------------------------------- + gitcodechurn.py | 42 | 1 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 45 | 0 | 1 +------------------------------------------------------------------------------- + gitcodechurn.py | 47 | 1 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 50 | 0 | 1 +------------------------------------------------------------------------------- + gitcodechurn.py | 52 | 1 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 55 | 0 | 1 +------------------------------------------------------------------------------- + gitcodechurn.py | 57 | 8 | 1 +------------------------------------------------------------------------------- + gitcodechurn.py | 66 | 2 | 0 +------------------------------------------------------------------------------- + gitcodechurn.py | 62 | 0 | 1 +... +------------------------------------------------------------------------------- + gitcodechurn.py | 200 | 1 | 0 +------------------------------------------------------------------------------- + README.md | 12 | 2 | 0 +------------------------------------------------------------------------------- + README.md | 16 | 0 | 1 +------------------------------------------------------------------------------- + README.md | 18 | 1 | 0 +------------------------------------------------------------------------------- + README.md | 21 | 11 | 0 +------------------------------------------------------------------------------- + README.md | 20 | 0 | 1 +------------------------------------------------------------------------------- + README.md | 33 | 1 | 0 +------------------------------------------------------------------------------- + README.md | 22 | 0 | 1 +------------------------------------------------------------------------------- + README.md | 35 | 1 | 0 +------------------------------------------------------------------------------- + README.md | 24 | 0 | 2 +------------------------------------------------------------------------------- + README.md | 37 | 3 | 0 +------------------------------------------------------------------------------- + README.md | 41 | 12 | 0 +``` + Outputs of Usage Example 1 can be used as part of a pipeline that generates bar charts for reports: ![contribution vs churn example chart](/chart.png) + +## How to contribute + +At this time, the code is organized into a script which can be located at `gitcodechurn.py`. There is [an open issue](https://github.com/flacle/truegitcodechurn/issues/9) to conver this repository +into a more formal Object Oriented structure. + +For now the code is located in `gitcodechurn.py` and the tests are located in `test_gitcodechurn.py`. To test, kindly ensure `pytest` and `pytest-mock` are installed. Then, in project root run + +```bash +pytest +``` diff --git a/gitcodechurn.py b/gitcodechurn.py index 1d98275..af6a3f6 100644 --- a/gitcodechurn.py +++ b/gitcodechurn.py @@ -26,11 +26,12 @@ ''' -import subprocess -import shlex -import os import argparse import datetime +import os +import shlex +import subprocess + def main(): parser = argparse.ArgumentParser( @@ -66,6 +67,11 @@ def main(): default = '', help = 'the Git repository subdirectory to be excluded' ) + parser.add_argument( + "--show-file-data", + action="store_true", + help="Display line change information for the analyzed file(s)" + ) args = parser.parse_args() after = args.after @@ -84,6 +90,69 @@ def main(): commits = get_commits(before, after, author, dir) + files, contribution, churn = calculate_statistics(commits, dir, exdir) + + # if author is empty then print a unique list of authors + if len(author.strip()) == 0: + authors = set(get_commits(before, after, author, dir, '%an')).__str__() + authors = authors.replace('{', '').replace('}', '').replace("'","") + print('authors: \t', authors) + else: + print('author: \t', author) + print('contribution: \t', contribution) + print('churn: \t\t', -churn) + # print files in case more granular results are needed + #print('files: ', files) + + if args.show_file_data: + display_file_metrics(files) + + +def display_file_metrics(files): + display_file_metrics_header() + for file_name, line_change_info in files.items(): + for line_number, line_diff_stats in line_change_info.items(): + display_file_metrics_row(file_name, line_number, line_diff_stats) + + +def display_file_metrics_header(): + print("-" * 79) + print( + "{file}|{line_number}|{lines_added}|{lines_removed}".format( + file=format_column("FILE NAME", 34), + line_number=format_column("LINE #", 10), + lines_added=format_column("ADDED", 10), + lines_removed=format_column("REMOVED", 10), + ) + ) + + +def display_file_metrics_row(file_name, line_number, line_diff_stats): + added = line_diff_stats.get("lines_added") + removed = line_diff_stats.get("lines_removed") + + if added == 0 and removed == 0: + return + print("-" * 79) + print( + "{file}|{ln}|{lines_added}|{lines_removed}".format( + file=format_column(file_name, 34), + ln=format_column(str(line_number), 10), + lines_added=format_column(str(added), 10), + lines_removed=format_column(str(removed), 10), + ) + ) + + +def format_column(text, width): + text_length = len(text) + total_pad = width - text_length + pad_left = total_pad // 2 + pad_right = total_pad - pad_left + return (" " * pad_left) + text + (" " * pad_right) + + +def calculate_statistics(commits, dir, exdir): # structured like this: files -> LOC files = {} @@ -100,17 +169,9 @@ def main(): exdir ) - # if author is empty then print a unique list of authors - if len(author.strip()) == 0: - authors = set(get_commits(before, after, author, dir, '%an')).__str__() - authors = authors.replace('{', '').replace('}', '').replace("'","") - print('authors: \t', authors) - else: - print('author: \t', author) - print('contribution: \t', contribution) - print('churn: \t\t', -churn) - # print files in case more granular results are needed - #print('files: ', files) + return files, contribution, churn + + def get_loc(commit, dir, files, contribution, churn, exdir): # git show automatically excludes binary file changes @@ -118,7 +179,7 @@ def get_loc(commit, dir, files, contribution, churn, exdir): if len(exdir) > 1: # https://stackoverflow.com/a/21079437 command += ' -- . ":(exclude,icase)'+exdir+'"' - results = get_proc_out(command, dir).splitlines() + results = get_commit_results(command, dir) file = '' loc_changes = '' @@ -133,18 +194,99 @@ def get_loc(commit, dir, files, contribution, churn, exdir): new_loc_changes = is_loc_change(result, loc_changes) if loc_changes != new_loc_changes: loc_changes = new_loc_changes - locc = get_loc_change(loc_changes) - for loc in locc: - if loc in files[file]: - files[file][loc] += locc[loc] - churn += abs(locc[loc]) - else: - files[file][loc] = locc[loc] - contribution += abs(locc[loc]) + (removal, addition) = get_loc_change(loc_changes) + + files, contribution, churn = merge_operations(removal, addition, files, contribution, churn, file) else: continue return [files, contribution, churn] + +def merge_operations(removal, addition, files, contribution, churn, file): + # Ensure all required data is in place + ensure_file_exists(files, file) + + file_line_churn_dict = files[file] + + if is_noop(removal, addition): + # In the case of a noop, it's not counted in change metrics, but should + # be marked as changed to accurately include future churn metrics + # An example of this is a diff like: + # "diff --git README.md README.md", + # "index bedbc85..bb033cd 100644", + # "--- README.md", + # "+++ README.md", + # "@@ -8 +8 @@ Code churn has several definitions, the one that to me provides the most value a", + # "-*Reference: https://blog.gitprime.com/why-code-churn-matters/*", + # "+*Reference: https://www.pluralsight.com/blog/teams/why-code-churn-matters*", + # In this example, we deleted the line, and then added the line by updating the link + # This repo would consider this a "No-Op" as it nets to no change + # However, we want to mark line 8 as changed so that all subsequent + # changes to line 8 are marked as churn + # The thinking behind this is the other updates should have been made + # while this change was being made. + remove_line_number = removal[0] + ensure_line_exists(file_line_churn_dict, remove_line_number) + return files, contribution, churn + + for (line_number, lines_removed, lines_added) in compute_changes(removal, addition): + # Churn check performed before line modification changes + is_churn = is_this_churn(file_line_churn_dict, line_number) + + ensure_line_exists(file_line_churn_dict, line_number) + line_count_change_metrics = file_line_churn_dict[line_number] + + line_count_change_metrics["lines_removed"] += lines_removed + line_count_change_metrics["lines_added"] += lines_added + + if is_churn: + churn += abs(lines_removed) + abs(lines_added) + else: + contribution += abs(lines_removed) + abs(lines_added) + + return files, contribution, churn + + +def compute_changes(removal, addition): + # If both removal and addition affect the same line, net out the change + # Returns a list of tuples of type (line_number, lines_removed, lines_added) + removed_line_number, lines_removed = removal + added_line_number, lines_added = addition + + if removed_line_number == added_line_number: + if lines_added >= lines_removed: + return [(removed_line_number, 0, (lines_added - lines_removed))] + else: + return [(removed_line_number, (lines_removed - lines_added), 0)] + else: + return [ + (removed_line_number, lines_removed, 0), + (added_line_number, 0, lines_added), + ] + + +def is_this_churn(file_line_churn_dict, line_number): + # The definition of churn is any change to a line + # after the first time the line has been changed + # This is detected by a line operation logged in the file_line_churn_dict + return line_number in file_line_churn_dict + + +def ensure_line_exists(file_line_churn_dict, line_number): + if line_number not in file_line_churn_dict: + file_line_churn_dict[line_number] = {"lines_removed": 0, "lines_added": 0} + + +def ensure_file_exists(files, file): + if file not in files: + files[file] = {} + + +def is_noop(removal, addition): + # A noop event occurs when a change indicates one delete and one add on the same line + return removal == addition + + # arrives in a format such as -13 +27,5 (no commas mean 1 loc change) # this is the chunk header where '-' is old and '+' is new # it returns a dictionary where left are removals and right are additions @@ -161,6 +303,8 @@ def get_loc_change(loc_changes): left = int(left[1:]) left_dec = 1 + removal = (left, left_dec) + # additions right = loc_changes[loc_changes.find(' ')+1:] right_dec = 0 @@ -172,10 +316,10 @@ def get_loc_change(loc_changes): right = int(right[1:]) right_dec = 1 - if left == right: - return {left: (right_dec - left_dec)} - else: - return {left : left_dec, right: right_dec} + addition = (right, right_dec) + + return (removal, addition) + def is_loc_change(result, loc_changes): # search for loc changes (@@ ) and update loc_changes variable diff --git a/test_gitcodechurn.py b/test_gitcodechurn.py new file mode 100644 index 0000000..20af215 --- /dev/null +++ b/test_gitcodechurn.py @@ -0,0 +1,1041 @@ +from gitcodechurn import ( + calculate_statistics, + get_loc, + get_loc_change, + is_loc_change, + is_new_file, +) + +# All mocks generated from +# `python ./gitcodechurn.py after="2018-11-29" before="2022-03-01" author="flacle" dir="{{local_copy_of_this_repo}}"` +# as of November 4th 2021 + +# The Keys are the responses from `get_commits` +# The values are dictionaries comprised of the command input into `get_proc_out` and `get_proc_out` is the result of that command +# This should allow a mock of this program end to end as well as to test sub-components +MOCK_DATA = [ + ( + "febc94587dd7623832f10259c5dc8177fec50ae7", + { + "command": "git show --format= --unified=0 --no-prefix febc94587dd7623832f10259c5dc8177fec50ae7", + "get_proc_out": [ + "diff --git gitcodechurn.py gitcodechurn.py", + "new file mode 100644", + "index 0000000..1d7ff61", + "--- /dev/null", + "+++ gitcodechurn.py", + "@@ -0,0 +1,190 @@", + "+'''", + '+Script to compute "true" code churn of a Git repository.', + "+", + "+Code churn has several definitions, the one that to me provides the", + "+most value as a metric is:", + "+", + '+"Code churn is when an engineer', + '+rewrites their own code in a short period of time."', + "+", + "+Reference: https://blog.gitprime.com/why-code-churn-matters/", + "+", + "+This script looks at a range of commits per author. For each commit it", + "+book-keeps the files that were changed along with the lines of code (LOC)", + "+for each file. LOC are kept in a sparse structure and changes per LOC are taken", + "+into account as the program loops. When a change to the same LOC is detected it", + "+updates this separately to bookkeep the true code churn.", + "+", + "+Result is a print with aggregated contribution and churn per author for a", + "+given time period.", + "+", + "+Tested with Python version 3.5.3 and Git version 2.20.1", + "+", + "+'''", + "+", + "+import subprocess", + "+import shlex", + "+import os", + "+import argparse", + "+import datetime", + "+", + "+def main():", + "+ parser = argparse.ArgumentParser(", + "+ description = 'Compute true git code churn (for project managers)'", + "+ )", + "+ parser.add_argument(", + "+ '--before',", + "+ type = str,", + "+ help = 'before a certain date, in YYYY-MM-DD format'", + "+ )", + "+ parser.add_argument(", + "+ '--after',", + "+ type = str,", + "+ help = 'after a certain date, in YYYY-MM-DD format'", + "+ )", + "+ parser.add_argument(", + "+ '--author',", + "+ type = str,", + "+ help = 'author string (not committer)'", + "+ )", + "+ parser.add_argument(", + "+ '--dir',", + "+ type = str,", + "+ help = 'Git repository directory'", + "+ )", + "+ args = parser.parse_args()", + "+", + "+ before = args.before", + "+ after = args.after", + "+ author = args.author", + "+ dir = args.dir", + "+", + "+ commits = get_commits(before, after, author, dir)", + "+", + "+ # structured like this: files -> LOC", + "+ files = {}", + "+", + "+ contribution = 0", + "+ churn = 0", + "+", + "+ for commit in commits:", + "+ [files, contribution, churn] = get_loc(", + "+ commit,", + "+ dir,", + "+ files,", + "+ contribution,", + "+ churn", + "+ )", + "+", + "+ # print files in case more granular results are needed", + "+ print('contribution: ', contribution)", + "+ print('churn: ', -churn)", + "+", + "+def get_loc(commit, dir, files, contribution, churn):", + "+ # git show automatically excludes binary file changes", + "+ command = 'git show --format= --unified=0 --no-prefix ' + commit", + "+ results = get_proc_out(command, dir).splitlines()", + "+ file = ''", + "+ loc_changes = ''", + "+", + "+ # loop through each row of output", + "+ for result in results:", + "+ new_file = is_new_file(result, file)", + "+ if file != new_file:", + "+ file = new_file", + "+ if file not in files:", + "+ files[file] = {}", + "+ else:", + "+ new_loc_changes = is_loc_change(result, loc_changes)", + "+ if loc_changes != new_loc_changes:", + "+ loc_changes = new_loc_changes", + "+ locc = get_loc_change(loc_changes)", + "+ for loc in locc:", + "+ if loc in files[file]:", + "+ files[file][loc] += locc[loc]", + "+ churn += abs(locc[loc])", + "+ else:", + "+ files[file][loc] = locc[loc]", + "+ contribution += abs(locc[loc])", + "+ else:", + "+ continue", + "+ return [files, contribution, churn]", + "+", + "+# arrives in a format such as -13 +27,5 (no decimals == 1 loc change)", + "+# returns a dictionary where left are removals and right are additions", + "+# if the same line got changed we subtract removals from additions", + "+def get_loc_change(loc_changes):", + "+ # removals", + "+ left = loc_changes[:loc_changes.find(' ')]", + "+ left_dec = 0", + "+ if left.find(',') > 0:", + "+ comma = left.find(',')", + "+ left_dec = int(left[comma+1:])", + "+ left = int(left[1:comma])", + "+ else:", + "+ left = int(left[1:])", + "+ left_dec = 1", + "+", + "+ # additions", + "+ right = loc_changes[loc_changes.find(' ')+1:]", + "+ right_dec = 0", + "+ if right.find(',') > 0:", + "+ comma = right.find(',')", + "+ right_dec = int(right[comma+1:])", + "+ right = int(right[1:comma])", + "+ else:", + "+ right = int(right[1:])", + "+ right_dec = 1", + "+", + "+ if left == right:", + "+ return {left: (right_dec - left_dec)}", + "+ else:", + "+ return {left : left_dec, right: right_dec}", + "+", + "+", + "+", + "+def is_loc_change(result, loc_changes):", + "+ # search for loc changes (@@ ) and update loc_changes variable", + "+ if result.startswith('@@'):", + "+ loc_change = result[result.find(' ')+1:]", + "+ loc_change = loc_change[:loc_change.find(' @@')]", + "+ return loc_change", + "+ else:", + "+ return loc_changes", + "+", + "+def is_new_file(result, file):", + "+ # search for destination file (+++ ) and update file variable", + "+ if result.startswith('+++'):", + "+ return result[result.rfind(' ')+1:]", + "+ else:", + "+ return file", + "+", + "+def get_commits(before, after, author, dir):", + "+ # note --no-merges flag (usually we coders do not overhaul contributions)", + "+ # note --reverse flag to traverse history from past to present", + "+ command = 'git log --author=\"'+author+'\" --format=\"%h\" --no-abbrev '", + "+ command += '--before=\"'+before+'\" --after=\"'+after+'\" --no-merges --reverse'", + "+ return get_proc_out(command, dir).splitlines()", + "+", + "+# not used but still could be of value in the future", + "+def get_files(commit, dir):", + "+ # this also works in case --no-merges flag is ommitted prior", + "+ command = 'git show --numstat --pretty=\"\" ' + commit", + "+ results = get_proc_out(command, dir).splitlines()", + "+ for i in range(len(results)):", + "+ # remove the tabbed stats from --numstat", + "+ results[i] = results[i][results[i].rfind('\\t')+1:]", + "+ return(results)", + "+", + "+def get_proc_out(command, dir):", + "+ process = subprocess.Popen(", + "+ command,", + "+ stdout=subprocess.PIPE,", + "+ stderr=subprocess.PIPE,", + "+ cwd=dir,", + "+ shell=True", + "+ )", + '+ return process.communicate()[0].decode("utf-8")', + "+", + "+if __name__ == '__main__':", + "+ main()", + ], + "files": {"gitcodechurn.py": {0: {"lines_added": 0, "lines_removed": 0}, 1: {"lines_added": 190, "lines_removed": 0}}}, + "contribution": 190, + "churn": 0, + }, + ), + ( + "c8a7da45254e8055016a50371ae5b5e12b978355", + { + "command": "git show --format= --unified=0 --no-prefix c8a7da45254e8055016a50371ae5b5e12b978355", + "get_proc_out": [ + "diff --git chart.png chart.png", + "new file mode 100644", + "index 0000000..2e3b67f", + "Binary files /dev/null and chart.png differ", + ], + "files": {}, + "contribution": 0, + "churn": 0, + }, + ), + ( + "a991baa3799b1efeb0fdb7ad88380fd24a902d96", + { + "command": "git show --format= --unified=0 --no-prefix a991baa3799b1efeb0fdb7ad88380fd24a902d96", + "get_proc_out": [ + "diff --git gitcodechurn.py gitcodechurn.py", + "index 1d7ff61..29d24ea 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -1,0 +2,4 @@", + "+Author: Francis Laclé", + "+License: MIT", + "+Version: 0.1", + "+", + ], + "files": {"gitcodechurn.py": {1: {"lines_added": 0, "lines_removed": 0}, 2: {"lines_added": 4, "lines_removed": 0}}}, + "contribution": 4, + "churn": 0, + }, + ), + ( + "431fb62b22606746f693a38eb308b08559d55232", + { + "command": "git show --format= --unified=0 --no-prefix 431fb62b22606746f693a38eb308b08559d55232", + "get_proc_out": [ + "diff --git README.md README.md", + "index bedbc85..bb033cd 100644", + "--- README.md", + "+++ README.md", + "@@ -8 +8 @@ Code churn has several definitions, the one that to me provides the most value a", + "-*Reference: https://blog.gitprime.com/why-code-churn-matters/*", + "+*Reference: https://www.pluralsight.com/blog/teams/why-code-churn-matters*", + ], + "files": {"README.md": {8: {"lines_removed": 0, "lines_added": 0}}}, + "contribution": 0, + "churn": 0, + }, + ), + ( + "fca7078eae1c5724cda4e5fee2526d2313d69bcf", + { + "command": "git show --format= --unified=0 --no-prefix fca7078eae1c5724cda4e5fee2526d2313d69bcf", + "get_proc_out": [ + "diff --git README.md README.md", + "index bb033cd..d76b96b 100644", + "--- README.md", + "+++ README.md", + "@@ -10 +10 @@ Code churn has several definitions, the one that to me provides the most value a", + "-Solutions that I've found online looked at changes to files irrespective whether these are new changes or edits to existing files. Hence this solution that segments code edits (churn) with new code changes (contribution).", + "+Solutions that I've found online looked at changes to files irrespective whether these are new changes or edits to existing lines of code within existing files. Hence this solution that segments line-of-code edits (churn) with new code changes (contribution).", + ], + "files": {"README.md": {10: {"lines_added": 0, "lines_removed": 0}}}, + "contribution": 0, + "churn": 0, + }, + ), + ( + "a5021395ea7099f9689986a3811121e09bbee6f0", + { + "command": "git show --format= --unified=0 --no-prefix a5021395ea7099f9689986a3811121e09bbee6f0", + "get_proc_out": [ + "diff --git README.md README.md", + "index d76b96b..f2afff3 100644", + "--- README.md", + "+++ README.md", + "@@ -2 +2 @@", + '-A Python script to compute "true" code churn of a Git repository. Especially useful for software teams.', + '+A Python script to compute "true" code churn of a Git repository. Useful for software teams to openly help manage technical debt.', + ], + "files": {"README.md": {2: {"lines_added": 0, "lines_removed": 0}}}, + "contribution": 0, + "churn": 0, + }, + ), + ( + "667dab70fccb844d65f439149fddfce2702d2e91", + { + "command": "git show --format= --unified=0 --no-prefix 667dab70fccb844d65f439149fddfce2702d2e91", + "get_proc_out": [ + "diff --git gitcodechurn.py gitcodechurn.py", + "index 29d24ea..035dbb8 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -37 +37,3 @@ def main():", + "- description = 'Compute true git code churn (for project managers)'", + "+ description = 'Compute true git code churn to understand tech debt.',", + "+ usage = 'python [*/]gitcodechurn.py before=YYY-MM-DD after=YYYY-MM-DD dir=[*/]path [-exdir=[*/]path] [-h]',", + "+ epilog = 'Feel free to fork at or contribute on: https://github.com/flacle/truegitcodechurn'", + "@@ -40 +42 @@ def main():", + "- '--before',", + "+ 'before',", + "@@ -45 +47 @@ def main():", + "- '--after',", + "+ 'after',", + "@@ -50 +52 @@ def main():", + "- '--author',", + "+ 'author',", + "@@ -55 +57,8 @@ def main():", + "- '--dir',", + "+ 'dir',", + "+ type = dir_path,", + "+ default = '',", + "+ help = 'include Git repository directory'", + "+ )", + "+ parser.add_argument(", + "+ '-exdir',", + "+ metavar='',", + "@@ -57 +66,2 @@ def main():", + "- help = 'Git repository directory'", + "+ default = '',", + "+ help = 'exclude Git repository subdirectory'", + "@@ -62 +72 @@ def main():", + "- after = args.after", + "+ after = args.after", + "@@ -64 +74,10 @@ def main():", + "- dir = args.dir", + "+ dir = args.dir", + "+ # exdir is optional", + "+ exdir = args.exdir", + "+", + "+ # for the positionals we remove the prefixes", + "+ # TODO not sure why this is happening", + "+ before = remove_prefix(before, 'before=')", + "+ after = remove_prefix(after, 'after=')", + "+ author = remove_prefix(author, 'author=')", + "+ # dir is already handled in dir_path()", + "@@ -80 +99,2 @@ def main():", + "- churn", + "+ churn,", + "+ exdir", + "@@ -83 +102,0 @@ def main():", + "- # print files in case more granular results are needed", + "@@ -85,0 +105,2 @@ def main():", + "+ # print files in case more granular results are needed", + "+ #print('files: ', files)", + "@@ -87 +108 @@ def main():", + "-def get_loc(commit, dir, files, contribution, churn):", + "+def get_loc(commit, dir, files, contribution, churn, exdir):", + "@@ -89,0 +111,3 @@ def get_loc(commit, dir, files, contribution, churn):", + "+ if len(exdir) > 1:", + "+ # https://stackoverflow.com/a/21079437", + "+ command += ' -- . \":(exclude,icase)'+exdir+'\"'", + "@@ -167 +191 @@ def get_commits(before, after, author, dir):", + "- # note --no-merges flag (usually we coders do not overhaul contributions)", + "+ # note --no-merges flag (usually we coders do not overhaul contrib commits)", + "@@ -192,0 +217,14 @@ def get_proc_out(command, dir):", + "+# https://stackoverflow.com/a/54547257", + "+def dir_path(path):", + "+ path = remove_prefix(path, 'dir=')", + "+ if os.path.isdir(path):", + "+ return path", + "+ else:", + '+ raise argparse.ArgumentTypeError("Directory "+path+" is not a valid path.")', + "+", + "+#https://stackoverflow.com/a/16891418", + "+def remove_prefix(text, prefix):", + "+ if text.startswith(prefix):", + "+ return text[len(prefix):]", + "+ return text # or whatever", + "+", + ], + "files": { + "gitcodechurn.py": { + 37: {'lines_added': 2, 'lines_removed': 0}, + 40: {'lines_added': 0, 'lines_removed': 1}, + 42: {'lines_added': 1, 'lines_removed': 0}, + 45: {'lines_added': 0, 'lines_removed': 1}, + 47: {'lines_added': 1, 'lines_removed': 0}, + 50: {'lines_added': 0, 'lines_removed': 1}, + 52: {'lines_added': 1, 'lines_removed': 0}, + 55: {'lines_added': 0, 'lines_removed': 1}, + 57: {'lines_added': 8, 'lines_removed': 1}, + 62: {'lines_added': 0, 'lines_removed': 1}, + 64: {'lines_added': 0, 'lines_removed': 1}, + 66: {'lines_added': 2, 'lines_removed': 0}, + 72: {'lines_added': 1, 'lines_removed': 0}, + 74: {'lines_added': 10, 'lines_removed': 0}, + 80: {'lines_added': 0, 'lines_removed': 1}, + 83: {'lines_added': 0, 'lines_removed': 1}, + 85: {'lines_added': 0, 'lines_removed': 0}, + 87: {'lines_added': 0, 'lines_removed': 1}, + 89: {'lines_added': 0, 'lines_removed': 0}, + 99: {'lines_added': 2, 'lines_removed': 0}, + 102: {'lines_added': 0, 'lines_removed': 0}, + 105: {'lines_added': 2, 'lines_removed': 0}, + 108: {'lines_added': 1, 'lines_removed': 0}, + 111: {'lines_added': 3, 'lines_removed': 0}, + 167: {'lines_added': 0, 'lines_removed': 1}, + 191: {'lines_added': 1, 'lines_removed': 0}, + 192: {'lines_added': 0, 'lines_removed': 0}, + 217: {'lines_added': 14, 'lines_removed': 0}, + } + }, + "churn": 1, + "contribution": 59, + }, + ), + ( + "0a6d4b5f90c08104542318981783826573731c5c", + { + "command": "git show --format= --unified=0 --no-prefix 0a6d4b5f90c08104542318981783826573731c5c", + "get_proc_out": [ + "diff --git gitcodechurn.py gitcodechurn.py", + "index 035dbb8..a0c9851 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -38 +38 @@ def main():", + "- usage = 'python [*/]gitcodechurn.py before=YYY-MM-DD after=YYYY-MM-DD dir=[*/]path [-exdir=[*/]path] [-h]',", + '+ usage = \'python [*/]gitcodechurn.py before="YYYY-MM-DD" after="YYYY-MM-DD" author="flacle" dir="[*/]path" [-exdir="[*/]path]" [-h]\',', + ], + "files": {"gitcodechurn.py": {38: {'lines_added': 0, 'lines_removed': 0}}}, + "churn": 0, + "contribution": 0, + }, + ), + ( + "75e631da2c788d3bac6a70a2458238f911c25bc4", + { + "command": "git show --format= --unified=0 --no-prefix 75e631da2c788d3bac6a70a2458238f911c25bc4", + "get_proc_out": [ + "diff --git gitcodechurn.py gitcodechurn.py", + "index a0c9851..98788eb 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -38 +38 @@ def main():", + '- usage = \'python [*/]gitcodechurn.py before="YYYY-MM-DD" after="YYYY-MM-DD" author="flacle" dir="[*/]path" [-exdir="[*/]path]" [-h]\',', + '+ usage = \'python [*/]gitcodechurn.py before="YYYY-MM-DD" after="YYYY-MM-DD" author="flacle" dir="[*/]path" [-exdir="[*/]path"] [-h]\',', + ], + "files": {"gitcodechurn.py": {38: {'lines_added': 0, 'lines_removed': 0}}}, + "contribution": 0, + "churn": 0, + }, + ), + ( + "954bdfc3ecad3346096679d69a77bfa145e297ed", + { + "command": "git show --format= --unified=0 --no-prefix 954bdfc3ecad3346096679d69a77bfa145e297ed", + "get_proc_out": [ + "diff --git README.md README.md", + "index 462b6a5..9b67fc1 100644", + "--- README.md", + "+++ README.md", + "@@ -11,0 +12,2 @@ Solutions that I've found online looked at changes to files irrespective whether", + "+*Tested with Python version 3.5.3 and Git version 2.20.1*", + "+", + "@@ -16 +18 @@ Result is a print with aggregated contribution and churn per author for a given", + "-Tested with Python version 3.5.3 and Git version 2.20.1", + "+***Note:*** This includes the `--no-merges` flag as it assumes that merge commits with or without merge conflicts are not indicative of churn.", + "@@ -18,0 +21,11 @@ Tested with Python version 3.5.3 and Git version 2.20.1", + "+Positional (required) arguments:", + "+- **after** \xa0\xa0\xa0\xa0\xa0\xa0\xa0after a certain date, in YYYY[-MM[-DD]] format", + "+- **before** \xa0\xa0\xa0\xa0before a certain date, in YYYY[-MM[-DD]] format", + "+- **author** \xa0\xa0\xa0\xa0author string (not committer)", + "+- **dir** \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0include Git repository directory", + "+", + "+Optional arguments:", + "+- **-h, --h, --help** \xa0\xa0\xa0show this help message and exit", + "+- **-exdir** \xa0\xa0\xa0\xa0 \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0exclude Git repository subdirectory", + "+", + "+## Example", + "@@ -20 +33 @@ Tested with Python version 3.5.3 and Git version 2.20.1", + '-python ./gitcodechurn.py before="2019-03-01" after="2018-11-29" author="an author" dir="/Users/myname/myrepo" -exdir="excluded-directory"', + '+python ./gitcodechurn.py after="2018-11-29" before="2019-03-01" author="an author" dir="/Users/myname/myrepo" -exdir="excluded-directory"', + '@@ -22 +35 @@ python ./gitcodechurn.py before="2019-03-01" after="2018-11-29" author="an autho', + "-# Output", + "+## Output", + '@@ -24,2 +37,3 @@ python ./gitcodechurn.py before="2019-03-01" after="2018-11-29" author="an autho', + "-contribution: 844", + "-churn: -28", + "+author: an author", + "+contribution: 844", + "+churn: -28", + "diff --git gitcodechurn.py gitcodechurn.py", + "index 98788eb..754bcb4 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -38 +38 @@ def main():", + '- usage = \'python [*/]gitcodechurn.py before="YYYY-MM-DD" after="YYYY-MM-DD" author="flacle" dir="[*/]path" [-exdir="[*/]path"] [-h]\',', + '+ usage = \'python [*/]gitcodechurn.py after="YYYY[-MM[-DD]]" before="YYYY[-MM[-DD]]" author="flacle" dir="[*/]path" [-exdir="[*/]path"]\',', + "@@ -42 +42 @@ def main():", + "- 'before',", + "+ 'after',", + "@@ -44 +44 @@ def main():", + "- help = 'before a certain date, in YYYY-MM-DD format'", + "+ help = 'after a certain date, in YYYY[-MM[-DD]] format'", + "@@ -47 +47 @@ def main():", + "- 'after',", + "+ 'before',", + "@@ -49 +49 @@ def main():", + "- help = 'after a certain date, in YYYY-MM-DD format'", + "+ help = 'before a certain date, in YYYY[-MM[-DD]] format'", + "@@ -71 +70,0 @@ def main():", + "- before = args.before", + "@@ -72,0 +72 @@ def main():", + "+ before = args.before", + "@@ -80 +79,0 @@ def main():", + "- before = remove_prefix(before, 'before=')", + "@@ -81,0 +81 @@ def main():", + "+ before = remove_prefix(before, 'before=')", + "@@ -103,2 +103,3 @@ def main():", + "- print('contribution: ', contribution)", + "- print('churn: ', -churn)", + "+ print('author: \\t', author)", + "+ print('contribution: \\t', contribution)", + "+ print('churn: \\t\\t', -churn)", + "@@ -223 +224 @@ def dir_path(path):", + '- raise argparse.ArgumentTypeError("Directory "+path+" is not a valid path.")', + '+ raise argparse.ArgumentTypeError(path + " is not a valid path.")', + ], + "files": { + "README.md": { + 11: {'lines_added': 0, 'lines_removed': 0}, + 12: {'lines_added': 2, 'lines_removed': 0}, + 16: {'lines_added': 0, 'lines_removed': 1}, + 18: {'lines_added': 1, 'lines_removed': 0}, + 20: {'lines_added': 0, 'lines_removed': 1}, + 21: {'lines_added': 11, 'lines_removed': 0}, + 22: {'lines_added': 0, 'lines_removed': 1}, + 24: {'lines_added': 0, 'lines_removed': 2}, + 33: {'lines_added': 1, 'lines_removed': 0}, + 35: {'lines_added': 1, 'lines_removed': 0}, + 37: {'lines_added': 3, 'lines_removed': 0}, + }, + "gitcodechurn.py": { + 38: {'lines_added': 0, 'lines_removed': 0}, + 42: {'lines_added': 0, 'lines_removed': 0}, + 44: {'lines_added': 0, 'lines_removed': 0}, + 47: {'lines_added': 0, 'lines_removed': 0}, + 49: {'lines_added': 0, 'lines_removed': 0}, + 70: {'lines_added': 0, 'lines_removed': 0}, + 71: {'lines_added': 0, 'lines_removed': 1}, + 72: {'lines_added': 1, 'lines_removed': 0}, + 79: {'lines_added': 0, 'lines_removed': 0}, + 80: {'lines_added': 0, 'lines_removed': 1}, + 81: {'lines_added': 1, 'lines_removed': 0}, + 103: {'lines_added': 1, 'lines_removed': 0}, + 223: {'lines_added': 0, 'lines_removed': 1}, + 224: {'lines_added': 1, 'lines_removed': 0}, + }, + }, + "contribution": 31, + "churn": 0, + }, + ), + ( + "2ae043792b6dc567f276fca4ff7587ac00686294", + { + "command": "git show --format= --unified=0 --no-prefix 2ae043792b6dc567f276fca4ff7587ac00686294", + "get_proc_out": [ + "diff --git README.md README.md", + "index 9b67fc1..ca83b1b 100644", + "--- README.md", + "+++ README.md", + "@@ -15 +15 @@ Solutions that I've found online looked at changes to files irrespective whether", + "-This script looks at a range of commits per author. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + "+This lightweight script looks at a range of commits per author. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + "diff --git gitcodechurn.py gitcodechurn.py", + "index 754bcb4..523ac2e 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -4 +4 @@ License: MIT", + "-Version: 0.1", + "+Version: 1.0.1", + "@@ -16,2 +16,2 @@ Reference: https://blog.gitprime.com/why-code-churn-matters/", + "-This script looks at a range of commits per author. For each commit it", + "-book-keeps the files that were changed along with the lines of code (LOC)", + "+This lightweight script looks at a range of commits per author. For each commit", + "+it book-keeps the files that were changed along with the lines of code (LOC)", + "@@ -173,2 +172,0 @@ def get_loc_change(loc_changes):", + "-", + "-", + "@@ -193,0 +192,2 @@ def get_commits(before, after, author, dir):", + "+ before = format_date(before)", + "+ after = format_date(after)", + "@@ -197,0 +198,25 @@ def get_commits(before, after, author, dir):", + "+# issue #6: append to date if it's missing month or day values", + "+def format_date(d):", + "+ d = d[:-1] if d.endswith('-') else d", + "+ if len(d) < 6:", + "+ # after is interpreted as 'after the year YYYY'", + "+ return d[0:4]+'-12-31'", + "+ elif len(d) < 8:", + "+ # here we need to check on which day a month ends", + "+ dt = datetime.datetime.strptime(d, '%Y-%m')", + "+ dt_day = get_month_last_day(dt)", + "+ dt_month = '{:02d}'.format(dt.month).__str__()", + "+ return d[0:4]+'-'+dt_month+'-'+dt_day", + "+ else:", + "+ dt = datetime.datetime.strptime(d, '%Y-%m-%d')", + "+ dt_day = '{:02d}'.format(dt.day).__str__()", + "+ dt_month = '{:02d}'.format(dt.month).__str__()", + "+ return d[0:4]+'-'+dt_month+'-'+dt_day", + "+", + "+# https://stackoverflow.com/a/43088", + "+def get_month_last_day(date):", + "+ if date.month == 12:", + "+ return date.replace(day=31)", + "+ ld = date.replace(month=date.month+1, day=1)-datetime.timedelta(days=1)", + "+ return ld.day.__str__()", + "+", + ], + "files": { + "README.md": {15: {'lines_added': 0, 'lines_removed': 0}}, + "gitcodechurn.py": { + 4: {'lines_added': 0, 'lines_removed': 0}, + 16: {'lines_added': 0, 'lines_removed': 0}, + 172: {'lines_added': 0, 'lines_removed': 0}, + 173: {'lines_added': 0, 'lines_removed': 2}, + 192: {'lines_added': 2, 'lines_removed': 0}, + 193: {'lines_added': 0, 'lines_removed': 0}, + 197: {'lines_added': 0, 'lines_removed': 0}, + 198: {'lines_added': 25, 'lines_removed': 0}, + }, + }, + "contribution": 29, + "churn": 0, + }, + ), + ( + "617c2c27d5492e7ff782856d244d4d0eb0432a47", + { + "command": "git show --format= --unified=0 --no-prefix 617c2c27d5492e7ff782856d244d4d0eb0432a47", + "get_proc_out": [ + "diff --git README.md README.md", + "index ca83b1b..abc8df6 100644", + "--- README.md", + "+++ README.md", + "@@ -15,2 +15,2 @@ Solutions that I've found online looked at changes to files irrespective whether", + "-This lightweight script looks at a range of commits per author. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + "-Result is a print with aggregated contribution and churn per author for a given time period.", + "+This lightweight script looks at commits per author for a given date range on the **current branch**. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + "+Result is a print with aggregated contribution and churn per author for a given period in time.", + "@@ -24 +24 @@ Positional (required) arguments:", + "-- **author** \xa0\xa0\xa0\xa0author string (not committer)", + "+- **author** \xa0\xa0\xa0\xa0author string (not a committer), leave blank to scope all authors", + "@@ -31 +31 @@ Optional arguments:", + "-## Example", + "+## Usage Example 1", + '@@ -35 +35 @@ python ./gitcodechurn.py after="2018-11-29" before="2019-03-01" author="an autho', + "-## Output", + "+## Output 1", + "@@ -41 +41,13 @@ churn: -28", + "-Outputs can be used as part of a pipeline that generates bar charts for reports:", + "+", + "+## Usage Example 2", + "+```bash", + '+python ./gitcodechurn.py after="2018-11-29" before="2019-03-01" author="" dir="/Users/myname/myrepo" -exdir="excluded-directory"', + "+```", + "+## Output 2", + "+```bash", + "+authors: author1, author2, author3", + "+contribution: 4423", + "+churn: -543", + "+```", + "+", + "+Outputs of Usage Example 1 can be used as part of a pipeline that generates bar charts for reports:", + "diff --git gitcodechurn.py gitcodechurn.py", + "index 523ac2e..1d98275 100644", + "--- gitcodechurn.py", + "+++ gitcodechurn.py", + "@@ -11,2 +11 @@ most value as a metric is:", + '-"Code churn is when an engineer', + '-rewrites their own code in a short period of time."', + '+"Code churn is when an engineer rewrites their own code in a short time period."', + "@@ -16,5 +15,6 @@ Reference: https://blog.gitprime.com/why-code-churn-matters/", + "-This lightweight script looks at a range of commits per author. For each commit", + "-it book-keeps the files that were changed along with the lines of code (LOC)", + "-for each file. LOC are kept in a sparse structure and changes per LOC are taken", + "-into account as the program loops. When a change to the same LOC is detected it", + "-updates this separately to bookkeep the true code churn.", + "+This lightweight script looks at commits per author for a given date range on", + "+the default branch. For each commit it bookkeeps the files that were changed", + "+along with the lines of code (LOC) for each file. LOC are kept in a sparse", + "+structure and changes per LOC are taken into account as the program loops. When", + "+a change to the same LOC is detected it updates this separately to bookkeep the", + "+true code churn.", + "@@ -44 +44 @@ def main():", + "- help = 'after a certain date, in YYYY[-MM[-DD]] format'", + "+ help = 'search after a certain date, in YYYY[-MM[-DD]] format'", + "@@ -49 +49 @@ def main():", + "- help = 'before a certain date, in YYYY[-MM[-DD]] format'", + "+ help = 'search before a certain date, in YYYY[-MM[-DD]] format'", + "@@ -54 +54 @@ def main():", + "- help = 'author string (not committer)'", + "+ help = 'an author (non-committer), leave blank to scope all authors'", + "@@ -60 +60 @@ def main():", + "- help = 'include Git repository directory'", + "+ help = 'the Git repository root directory to be included'", + "@@ -67 +67 @@ def main():", + "- help = 'exclude Git repository subdirectory'", + "+ help = 'the Git repository subdirectory to be excluded'", + "@@ -103 +103,7 @@ def main():", + "- print('author: \\t', author)", + "+ # if author is empty then print a unique list of authors", + "+ if len(author.strip()) == 0:", + "+ authors = set(get_commits(before, after, author, dir, '%an')).__str__()", + "+ authors = authors.replace('{', '').replace('}', '').replace(\"'\",\"\")", + "+ print('authors: \\t', authors)", + "+ else:", + "+ print('author: \\t', author)", + "@@ -142,2 +148,3 @@ def get_loc(commit, dir, files, contribution, churn, exdir):", + "-# arrives in a format such as -13 +27,5 (no decimals == 1 loc change)", + "-# returns a dictionary where left are removals and right are additions", + "+# arrives in a format such as -13 +27,5 (no commas mean 1 loc change)", + "+# this is the chunk header where '-' is old and '+' is new", + "+# it returns a dictionary where left are removals and right are additions", + "@@ -189 +196,2 @@ def is_new_file(result, file):", + "-def get_commits(before, after, author, dir):", + "+# use format='%an' to get a list of author names", + "+def get_commits(before, after, author, dir, format='%h'):", + "@@ -192,3 +200 @@ def get_commits(before, after, author, dir):", + "- before = format_date(before)", + "- after = format_date(after)", + "- command = 'git log --author=\"'+author+'\" --format=\"%h\" --no-abbrev '", + "+ command = 'git log --author=\"'+author+'\" --format=\"'+format+'\" --no-abbrev '", + ], + "files": { + "README.md": { + 15: {'lines_added': 0, 'lines_removed': 0}, + 24: {'lines_added': 0, 'lines_removed': 0}, + 31: {'lines_added': 0, 'lines_removed': 0}, + 35: {'lines_added': 0, 'lines_removed': 0}, + 41: {'lines_added': 12, 'lines_removed': 0} + }, + "gitcodechurn.py": { + 11: {'lines_added': 0, 'lines_removed': 1}, + 15: {'lines_added': 6, 'lines_removed': 0}, + 16: {'lines_added': 0, 'lines_removed': 5}, + 44: {'lines_added': 0, 'lines_removed': 0}, + 49: {'lines_added': 0, 'lines_removed': 0}, + 54: {'lines_added': 0, 'lines_removed': 0}, + 60: {'lines_added': 0, 'lines_removed': 0}, + 67: {'lines_added': 0, 'lines_removed': 0}, + 103: {'lines_added': 6, 'lines_removed': 0}, + 142: {'lines_added': 0, 'lines_removed': 2}, + 148: {'lines_added': 3, 'lines_removed': 0}, + 189: {'lines_added': 0, 'lines_removed': 1}, + 192: {'lines_added': 0, 'lines_removed': 3}, + 196: {'lines_added': 2, 'lines_removed': 0}, + 200: {'lines_added': 1, 'lines_removed': 0}, + }, + }, + "contribution": 42, + "churn": 0, + }, + ), +] + + +def test_is_new_file(): + """Given a line from `get_proc`, identify if line represents new file. + + If the line does not represent a new file, return the file name given + Otherwise, return the name of the new file. + """ + given_file = "" + test_lines = [ + ("diff --git gitcodechurn.py gitcodechurn.py", given_file), + ("new file mode 100644", given_file), + ("index 0000000..1d7ff61", given_file), + ("--- /dev/null", given_file), + ("+++ gitcodechurn.py", "gitcodechurn.py"), + ("@@ -0,0 +1,190 @@", given_file), + ("+def is_new_file(result, file):", given_file), + ( + "+ # search for destination file (+++ ) and update file variable", + given_file, + ), + ("+ if result.startswith('+++'):", given_file), + ("+ return result[result.rfind(' ')+1:]", given_file), + ("--- README.md", given_file), + ("+++ README.md", "README.md"), + ( + "@@ -15,2 +15,2 @@ Solutions that I've found online looked at changes to files irrespective whether", + given_file, + ), + ( + "-This lightweight script looks at a range of commits per author. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + given_file, + ), + ( + "-Result is a print with aggregated contribution and churn per author for a given time period.", + given_file, + ), + ( + "+This lightweight script looks at commits per author for a given date range on the **current branch**. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + given_file, + ), + ( + "+Result is a print with aggregated contribution and churn per author for a given period in time.", + given_file, + ), + ("@@ -24 +24 @@ Positional (required) arguments:", given_file), + ] + for (line, expected) in test_lines: + assert is_new_file(line, given_file) == expected + + +def test_is_loc_change(): + """Given a line from `get_proc`, identify is the line contains loc change data. + + If the line represents loc change data, return the change, + Otherwise, return the loc_changes argument given. + + For example, given a line `@@ -15,2 +15,2 @@ Solutions that I've found online looked at changes to files irrespective whether` + return: `"-15,2 +15,2"` + """ + given_loc_changes = "" + test_lines = [ + ("diff --git gitcodechurn.py gitcodechurn.py", given_loc_changes), + ("new file mode 100644", given_loc_changes), + ("index 0000000..1d7ff61", given_loc_changes), + ("--- /dev/null", given_loc_changes), + ("+++ gitcodechurn.py", given_loc_changes), + ("@@ -0,0 +1,190 @@", "-0,0 +1,190"), + ("+def is_new_file(result, file):", given_loc_changes), + ( + "+ # search for destination file (+++ ) and update file variable", + given_loc_changes, + ), + ("+ if result.startswith('+++'):", given_loc_changes), + ("+ return result[result.rfind(' ')+1:]", given_loc_changes), + ("--- README.md", given_loc_changes), + ("+++ README.md", given_loc_changes), + ( + "@@ -15,2 +15,2 @@ Solutions that I've found online looked at changes to files irrespective whether", + "-15,2 +15,2", + ), + ( + "-This lightweight script looks at a range of commits per author. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + given_loc_changes, + ), + ( + "-Result is a print with aggregated contribution and churn per author for a given time period.", + given_loc_changes, + ), + ( + "+This lightweight script looks at commits per author for a given date range on the **current branch**. For each commit it bookkeeps the files that were changed along with the LOC for each file. LOC are kept in a sparse structure and changes per LOC are taken into account as the program loops. When a change to the same LOC is detected it updates this separately to bookkeep the true code churn.", + given_loc_changes, + ), + ( + "+Result is a print with aggregated contribution and churn per author for a given period in time.", + given_loc_changes, + ), + ("@@ -24 +24 @@ Positional (required) arguments:", "-24 +24"), + ] + for (line, expected) in test_lines: + assert is_loc_change(line, given_loc_changes) == expected + + +def test_get_loc_change(): + """Given a result from `is_loc_change`, extract the count of lines changed. + + The function will return a tuple of two line-change tuples + The schema of each line-change tuple is (line_number, lines_modified) + So "-11,0 +12,2" would become ((11, 0, 0), (12, 0, 2)) + """ + tests = [ + ("-11,0 +12,2", ((11,0), (12, 2))), + ("-16 +18", ((16,1), (18, 1))), + ("-18,0 +21,11", ((18,0),(21,11))), + ("-22 +35", ((22,1), (35,1))), + ("-24,2 +37,3", ((24,2),(37,3))), + ("-38 +38", ((38,1), (38,1))), + ("-42 +42", ((42,1),(42,1))), + ("-71 +70,0", ((71,1),(70,0))), + ("-72,0 +72", ((72,0),(72,1))), + ("-80 +79,0", ((80,1),(79,0))), + ("-81,0 +81", ((81,0),(81,1))), + ("-103,2 +103,3", ((103,2),(103,3))), + ("-223 +224", ((223,1),(224,1))), + ("-103 +103,7", ((103,1),(103,7))), + ("-142,2 +148,3", ((142,2),(148,3))), + ("-189 +196,2", ((189,1),(196,2))), + ("-192,3 +200", ((192,3),(200,1))), + ("-0,0 +1,190", ((0,0),(1,190))), + ("-1,0 +2,4", ((1,0), (2,4))), + ] + for (line, expected) in tests: + assert get_loc_change(line) == expected + + +def test_get_loc(mocker): + for (commit, inner_mocks) in MOCK_DATA: + files = {} + mock_command = inner_mocks.get("command") + commit_results = inner_mocks.get("get_proc_out", []) + expected_files = inner_mocks.get("files") + expected_contribution = inner_mocks.get("contribution") + expected_churn = inner_mocks.get("churn") + + # Mock the return as outlined above + m = mocker.patch("gitcodechurn.get_commit_results", return_value=commit_results) + + # Note: dir is not important here as the response is mocked + [actual_files, actual_contribution, actual_churn] = get_loc( + commit, "/tmp", files, 0, 0, "" + ) + + # Confirm the command was generated as expected + m.assert_called_with(mock_command, "/tmp") + + assert actual_files == expected_files + assert actual_churn == expected_churn + assert actual_contribution == expected_contribution + + +def test_calculate_statistics(mocker): + mock_return_values = [m[1].get("get_proc_out", []) for m in MOCK_DATA] + m = mocker.patch("gitcodechurn.get_commit_results") + + m.side_effect = mock_return_values + + commits = [m[0] for m in MOCK_DATA] + + actual_files, actual_contributions, actual_churn = calculate_statistics( + commits, "/tmp", "" + ) + + assert actual_contributions == 337 + assert actual_churn == 19 + + assert actual_files == { + "README.md": { + 2: {'lines_added': 0, 'lines_removed': 0}, + 8: {'lines_added': 0, 'lines_removed': 0}, + 10: {'lines_added': 0, 'lines_removed': 0}, + 11: {'lines_added': 0, 'lines_removed': 0}, + 12: {'lines_added': 2, 'lines_removed': 0}, + 15: {'lines_added': 0, 'lines_removed': 0}, + 16: {'lines_added': 0, 'lines_removed': 1}, + 18: {'lines_added': 1, 'lines_removed': 0}, + 20: {'lines_added': 0, 'lines_removed': 1}, + 21: {'lines_added': 11, 'lines_removed': 0}, + 22: {'lines_added': 0, 'lines_removed': 1}, + 24: {'lines_added': 0, 'lines_removed': 2}, + 31: {'lines_added': 0, 'lines_removed': 0}, + 33: {'lines_added': 1, 'lines_removed': 0}, + 35: {'lines_added': 1, 'lines_removed': 0}, + 37: {'lines_added': 3, 'lines_removed': 0}, + 41: {'lines_added': 12, 'lines_removed': 0}, + }, + "gitcodechurn.py": { + 0: {'lines_added': 0, 'lines_removed': 0}, + 1: {'lines_added': 190, 'lines_removed': 0}, + 2: {'lines_added': 4, 'lines_removed': 0}, + 4: {'lines_added': 0, 'lines_removed': 0}, + 11: {'lines_added': 0, 'lines_removed': 1}, + 15: {'lines_added': 6, 'lines_removed': 0}, + 16: {'lines_added': 0, 'lines_removed': 5}, + 37: {'lines_added': 2, 'lines_removed': 0}, + 38: {'lines_added': 0, 'lines_removed': 0}, + 40: {'lines_added': 0, 'lines_removed': 1}, + 42: {'lines_added': 1, 'lines_removed': 0}, + 44: {'lines_added': 0, 'lines_removed': 0}, + 45: {'lines_added': 0, 'lines_removed': 1}, + 47: {'lines_added': 1, 'lines_removed': 0}, + 49: {'lines_added': 0, 'lines_removed': 0}, + 50: {'lines_added': 0, 'lines_removed': 1}, + 52: {'lines_added': 1, 'lines_removed': 0}, + 54: {'lines_added': 0, 'lines_removed': 0}, + 55: {'lines_added': 0, 'lines_removed': 1}, + 57: {'lines_added': 8, 'lines_removed': 1}, + 60: {'lines_added': 0, 'lines_removed': 0}, + 62: {'lines_added': 0, 'lines_removed': 1}, + 64: {'lines_added': 0, 'lines_removed': 1}, + 66: {'lines_added': 2, 'lines_removed': 0}, + 67: {'lines_added': 0, 'lines_removed': 0}, + 70: {'lines_added': 0, 'lines_removed': 0}, + 71: {'lines_added': 0, 'lines_removed': 1}, + 72: {'lines_added': 2, 'lines_removed': 0}, + 74: {'lines_added': 10, 'lines_removed': 0}, + 79: {'lines_added': 0, 'lines_removed': 0}, + 80: {'lines_added': 0, 'lines_removed': 2}, + 81: {'lines_added': 1, 'lines_removed': 0}, + 83: {'lines_added': 0, 'lines_removed': 1}, + 85: {'lines_added': 0, 'lines_removed': 0}, + 87: {'lines_added': 0, 'lines_removed': 1}, + 89: {'lines_added': 0, 'lines_removed': 0}, + 99: {'lines_added': 2, 'lines_removed': 0}, + 102: {'lines_added': 0, 'lines_removed': 0}, + 103: {'lines_added': 7, 'lines_removed': 0}, + 105: {'lines_added': 2, 'lines_removed': 0}, + 108: {'lines_added': 1, 'lines_removed': 0}, + 111: {'lines_added': 3, 'lines_removed': 0}, + 142: {'lines_added': 0, 'lines_removed': 2}, + 148: {'lines_added': 3, 'lines_removed': 0}, + 167: {'lines_added': 0, 'lines_removed': 1}, + 172: {'lines_added': 0, 'lines_removed': 0}, + 173: {'lines_added': 0, 'lines_removed': 2}, + 189: {'lines_added': 0, 'lines_removed': 1}, + 191: {'lines_added': 1, 'lines_removed': 0}, + 192: {'lines_added': 2, 'lines_removed': 3}, + 193: {'lines_added': 0, 'lines_removed': 0}, + 196: {'lines_added': 2, 'lines_removed': 0}, + 197: {'lines_added': 0, 'lines_removed': 0}, + 198: {'lines_added': 25, 'lines_removed': 0}, + 200: {'lines_added': 1, 'lines_removed': 0}, + 217: {'lines_added': 14, 'lines_removed': 0}, + 223: {'lines_added': 0, 'lines_removed': 1}, + 224: {'lines_added': 1, 'lines_removed': 0}, + }, + }