Fix skipping Jupyter cells with unknown %% magic #4462

AleksMat · 2024-09-18T14:40:41Z

Description

This CL fixes an inconsistency of when to skip formatting of Jupyter cells with unknown magic methods.

Black correctly skips formatting in this case:

%%unknown_custom_magic
<code in another language that black shouldn't format>

However if the cell started with some empty lines black still tried to format it and failed:



%%unknown_custom_magic
<code in another language that black shouldn't format>

Also if the cell started with a comment black tried to format it:

# Some comment
%%unknown_custom_magic
<code in another language that black shouldn't format>

(This example isn't possible in normal Jupyter notebooks but it works in Colab.)

This PR ensures that black will skip formatting cells with a custom magic function which is preceded by any number of empty lines and lines with comments.

Additional notes:

I moved validate_cell function from __init__.py to handle_ipynb_magics.py module because it belongs there. (It also makes my maintenance of Pyink fork a bit easier if there are less things in __init__.py module.) But if there is a reason to keep it in __init__.py I can move it back.
Function _get_code_start could also be implemented like this:
```
def _get_code_start(src: str) -> str:
    match = re.search(r"^[\s\n]*([^#\s\n].*)$", src, flags=re.M)
    return match.group(1) if match else ""
```
It might be a bit more efficient but it is also less readable. Let me know if you would prefer this implementation.

Checklist - did you ...

Add an entry in CHANGES.md if necessary?
Add / update tests if necessary?
~~Add new / update outdated documentation?~~

JelleZijlstra · 2024-09-18T14:45:45Z

src/black/handle_ipynb_magics.py

+    start of the line and returns it. If such line doesn't exist, it returns an
+    empty string.
+    """
+    for match in re.finditer(".+", src):


Suggested change

for match in re.finditer(".+", src):

for match in src.splitlines():

Any reason not to use this?

There might be a slight performance difference because re.finditer will stop splitting string once the first line with code is found while src.splitlines() will always split the entire string by lines. But I haven't benchmarked anything and this probably won't affect much the performance of the entire formatting process.

I'm also ok with switching to src.splitlines() if that is the preferred way.

Thanks, I guess this is fine.

AleksMat added 2 commits September 18, 2024 14:06

Fix skipping Jupyter cells with unknown %% magic

817e9d4

Updated CHANGES.md with PR number

7da7620

JelleZijlstra reviewed Sep 18, 2024

View reviewed changes

Remove unused import from __init__ module

bf8817a

JelleZijlstra merged commit 8d9d18c into psf:main Sep 20, 2024
46 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix skipping Jupyter cells with unknown %% magic #4462

Fix skipping Jupyter cells with unknown %% magic #4462

AleksMat commented Sep 18, 2024 •

edited

Loading

JelleZijlstra Sep 18, 2024

AleksMat Sep 18, 2024

JelleZijlstra Sep 20, 2024

	for match in re.finditer(".+", src):
	for match in src.splitlines():

Fix skipping Jupyter cells with unknown %% magic #4462

Fix skipping Jupyter cells with unknown %% magic #4462

Conversation

AleksMat commented Sep 18, 2024 • edited Loading

Description

Checklist - did you ...

JelleZijlstra Sep 18, 2024

Choose a reason for hiding this comment

AleksMat Sep 18, 2024

Choose a reason for hiding this comment

JelleZijlstra Sep 20, 2024

Choose a reason for hiding this comment

AleksMat commented Sep 18, 2024 •

edited

Loading