Skip to content

Commit

Permalink
Update README.md with git-filter-repo (#194)
Browse files Browse the repository at this point in the history
Update README.md with git-filter-repo example

Fixes #193.

---------

Co-authored-by: Florian Rathgeber <florian.rathgeber@gmail.com>
  • Loading branch information
LunarLanding and kynan committed Mar 23, 2024
1 parent 883d871 commit 0ac04d5
Showing 1 changed file with 23 additions and 16 deletions.
39 changes: 23 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,23 +197,30 @@ Note that you need to uninstall with the same flags:
### Apply retroactively

`nbstripout` can be used to rewrite an existing Git repository using
`git filter-branch` to strip output from existing notebooks. This invocation
uses `--index-filter` and operates on all ipynb-files in the repo: :

git filter-branch -f --index-filter '
git checkout -- :*.ipynb
find . -name "*.ipynb" -exec nbstripout "{}" +
git add . --ignore-removal
[`git filter-repo`](https://github.com/newren/git-filter-repo) to strip output
from existing notebooks. This invocation operates on all ipynb files in the repo:

```sh
#!/usr/bin/env bash
# get lint-history with callback from https://github.com/newren/git-filter-repo/pull/542
./lint-history.py --relevant 'return filename.endswith(b".ipynb")' --callback '
import json, warnings, nbformat
from nbstripout import strip_output
from nbformat.reader import NotJSONError
try:
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=UserWarning)
notebook = nbformat.reads(blob.data, as_version=nbformat.NO_CONVERT)
# customize to your needs
strip_output(notebook, keep_output=False, keep_count=False, keep_id=False, extra_keys=["metadata.widgets","metadata.execution","cell.attachments"], drop_empty_cells=True, drop_tagged_cells=[],strip_init_cells=False, max_size=0)
old_len = len(blob.data)
blob.data = (nbformat.writes(notebook) + "\n").encode("utf-8")
if old_len != len(blob.data):
print(change.blob_id, change.filename, old_len, len(blob.data))
except NotJSONError as e:
print("ERROR", type(e), change.blob_id, filename)
'

If the repository is large and the notebooks are in a subdirectory it will run
faster with `git checkout -- :<subdir>/*.ipynb`. You will get a warning for
commits that do not contain any notebooks, which can be suppressed by piping
stderr to `/dev/null`.

This is a potentially slower but simpler invocation using `--tree-filter`:

git filter-branch -f --tree-filter 'find . -name "*.ipynb" -exec nbstripout "{}" +'
```

### Removing empty cells

Expand Down

0 comments on commit 0ac04d5

Please sign in to comment.