Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regular formatting of multiline expressions #317

Closed
e3krisztian opened this issue Jun 8, 2018 · 6 comments
Closed

Regular formatting of multiline expressions #317

e3krisztian opened this issue Jun 8, 2018 · 6 comments
Labels
T: style What do we want Blackened code to look like?

Comments

@e3krisztian
Copy link

Operating system: Arch Linux
Python version: 3.6.5
Black version: 18.6b1
Does also happen on master: yes


Function names should start left of their parameters, to maintain left to right reading.

While this might sound natural, in multiline expressions it is surprisingly frequently violated.
Violations result in irregular/strange looking code, that require non-linear scanning to understand.

The examples below are generated by black -l 40:


# original
if function1(arg1, arg2, arg3) and f2(arg1, arg2, arg3):
    pass

if function1(arg1, arg2, arg3) and function2(arg1, arg2, arg3):
    pass
# expected, regular
if (
    function1(arg1, arg2, arg3)
    and f2(arg1, arg2, arg3)
):
    pass

if (
    function1(arg1, arg2, arg3)
    and function2(arg1, arg2, arg3)
):
    pass
# black, irregular
if function1(arg1, arg2, arg3) and f2(
    arg1, arg2, arg3
):
    pass

if function1(
    arg1, arg2, arg3
) and function2(arg1, arg2, arg3):
    pass

# original
variable = some_long_function(with_long, argument, list)
# expected
variable = (
    some_long_function(
        with_long, argument, list
    )
)
# black
variable = some_long_function(
    with_long, argument, list
)

Is there any difference?

# black
variable = (
    [long, list]
    + [another, long, list]
    + [yet, more]
)

# black
variable = f(
    [long, list]
    + [another, long, list]
    + [yet, more]
)
# expected formatting of the second one
variable = (
    f(
        [long, list]
        + [another, long, list]
        + [yet, more]
    )
)

Method calls, method chaining

# expected
variable = (
    object
    .method1(a, r, g, u, m, e, n, t)
    .method2(argument2)
)

variable = (
    "variable1 = {}"
    .format(variable)
)
# black
variable = object.method1(
    a, r, g, u, m, e, n, t
).method2(argument2)


variable = "variable1 = {}".format(
    variable
)

All the expected examples are wrapped in an extra ()-s and thus the expression's internal ()-s are not needed to be repurposed for implicit line continuations.

@e3krisztian
Copy link
Author

Arguments to functions should be right of the function's name.

I want to move the discussion forward (with myself so far :() with real examples.

Reformatting black's code with itself, after with disabling the can_omit_invisible_parens function (e3krisztian@247fdfa) the diff is this: e3krisztian@657d474

There were few cases when the original was better

but my feeling is, that most of the time the code after switching off the heuristic is superior

or can be made equivalent (if ():, if not (): special cases:

@ambv
Copy link
Collaborator

ambv commented Jun 20, 2018

Function names should start left of their parameters, to maintain left to right reading.

Can you find a reference where this rule comes from? Reading happens left-to-right within a single line. It's not a violation of this rule to use multiple lines.

In general, I looked through the diff on your fork but I disagree with many of your assessments. I do see that you like regularity and don't mind extra parentheses. But a lot of people do, myself included.

People write code from left to right within a single line and most often think about splitting lines when they reach the line length limit. So splitting close to the right edge of the line generally looks more natural than splitting early.

For example, I consider those formattings worse:

# proposed
variable = (
    some_long_function(
        with_long, argument, list
    )
)

# current behavior
variable = some_long_function(
    with_long, argument, list
)
# proposed
variable = (
    f(
        [long, list]
        + [another, long, list]
        + [yet, more]
    )
)

# current behavior
variable = f(
    [long, list]
    + [another, long, list]
    + [yet, more]
)

In the same vein, I consider your proposed change to force call chaining even on the first call rather overzealous. Black is obviously a dumb machine formatter but it does try to format code like real people would. I know nobody who would naturally format code like this:

message = (
    "formatting string with {}"
    .format(variable)
)

but plenty of people format like this (which is the current behavior):

message = "formatting string with {}".format(
    variable
)

I want to move the discussion forward (with myself so far :()

Sorry, I can't respond to every new issue on a daily basis. Design discussions naturally spread longer because they have way more far-reaching consequences for the future of the project.

@ambv ambv added the T: style What do we want Blackened code to look like? label Jun 20, 2018
@e3krisztian
Copy link
Author

I want to move the discussion forward (with myself so far :()

I was passive aggressive with those words. I am really sorry about that!


Can you find a reference where this rule comes from? Reading happens left-to-right within a single line. It's not a violation of this rule to use multiple lines.

Layout with direction and cohesion guarantees enable skipping over parts safely. Skipping over uninteresting parts speeds up reading and understanding.

I do not know about anyone else putting into words this "rule" ("parameters should be right of function name"). So I can give no reference. It is a recent revelation to me, which I see justified again and again since (I see it also in some of black's issues).

At least some people also feel it natural to have the arguments right of the function name even for multi-line layout. The evidence is the tropical layout (function calls are like palm trees of different size):

# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

variable1 = function_with_long_name(param1,
                                    param2,
                                    param3)
variable2 = another_function(long_param1,
                             param2)

This layout has obvious drawbacks, but I do see it reappearing/used from time to time - e.g. the first call with the comment is the first "Yes" example from PEP8.


In general, I looked through the diff on your fork but I disagree with many of your assessments. I do see that you like regularity and don't mind extra parentheses. But a lot of people do, myself included.

Actually I do mind extra parentheses, but I would choose them over a worse alternative anytime. When reading complex multiline expressions, I do not mind if there are more parens, and the layout is helping - it is properly indented, regular, uninteresting.

In my "fork" (it is not intended to be a workable alternative) I have just trivially disabled some optimizations, so it is not giving the best possible layout, but I hoped to show, that the smartness in those optimizations have irregular side effects even on black's code, hindering readability - breaking code at unexpected places.


People write code from left to right within a single line and most often think about splitting lines when they reach the line length limit. So splitting close to the right edge of the line generally looks more natural than splitting early.

I do not think it really matters how people wrote the code, once it can be reformatted. The way we write has a factor of lazyness in it, taking different - person dependent - shortcuts.
However, when the code is reformatted, I expect it to have some properties it may not have before: enhanced readability, eliminated bad shortcuts.

Yes, it would make the code look regular/uniform, but is not this the goal?

Black is obviously a dumb machine formatter but it does try to format code like real people would.

Well, maybe its just me, but I expect a code formatter to do a better job than most real people: regular layout without random exceptions ;)


For example, I consider those formattings worse:

# proposed
variable = (
    f(
        [long, list]
        + [another, long, list]
        + [yet, more]
    )
)

# current behavior
variable = f(
    [long, list]
    + [another, long, list]
    + [yet, more]
)

Sorry to disagree, but the current behavior is dangerous, not better, as it lays out a very different construct (function call with single argument expression vs expression) the same way:

# black
variable = (
    [long, list]
    + [another, long, list]
    + [yet, more]
)

Do you also feel black's formatting of the first set of of examples acceptable?

@e3krisztian
Copy link
Author

I was just looking at the diffs again - could you explicitly point to any "assessment" in my diffs to black, with which you disagree?

@kaste
Copy link

kaste commented Sep 26, 2018

Although I'm relatively +1 for what's @e3krisztian is showing here, I think the issue is probably too broad.

From reading the answer on (my) #527, it seems that aligning the boolean operators
(and/or) would be acceptable but hard to do. Do we have consensus on that?
If so, maybe @e3krisztian would be even willing to try to implement it.

Examples from #527

# black

    if virtual_view.select_line(lines - 1).strip() == '' and (
        lines < 2
        or virtual_view.select_line(lines - 2).strip() != ''
    ):
        ...


    elif inspect.isclass(self.mocked_obj) and inspect.isclass(
        original_method
    ):                    
# wanted

    if (
        virtual_view.select_line(lines - 1).strip() == ''
        and (
            lines < 2 
            or virtual_view.select_line(lines - 2).strip() != ''
        )
    ):
        ...


    elif (
        inspect.isclass(self.mocked_obj) 
        and inspect.isclass(original_method)
    ):                    

Why do I want this? Complex conditional expressions are usually hard to read,
and they must be read slowly and carefully. You basically cannot over-structure a complicated nested conditional. And therefore, if I write for example an if statement I always add parens around the whole thing as soon as it doesn't fit on a single line.

When it comes to just function invocation, I'm not convinced that always adding parens can be justified. I often see this:

    a_long_var_name = (
        another_function(luckily, fits, on, this, one, line)
    )

    # Very similar to what people do with \ in such cases
    # If they do, they obviously want to preserve L-to-R reading 
    # of the fn invocation
    a_long_var_name = \
        another_function(luckily, fits, on, this, one, line)
    
    a_long_var_name = another_function(
        unlucky, bc, this, fits, not, on, one, line, anymore
    )

    a_long_var_name = another_function(
        and,
        now,
        were,
        doomed,
        and,
        explode,
        ...
    )

I think the L-to-R reading is important for people, and if so, they stick to the formatting in the above 'lucky' cases, but if it really doesn't fit, they give up and just don't.

P.S. I do

logger.error(
	"This is a long helpful message"
    "with some variables {} in it."
    .format(variable)
)

often enough. It doesn't look unusual at all as soon as the string is long enough.

@ambv
Copy link
Collaborator

ambv commented Sep 26, 2018

OK, I thought about this quite a bit. We won't be adding extra parentheses to disambiguate assignments from calls or force fluent interfaces on first period. Those are overzealous in my view. You are supposed to read the first line of a statement to figure out what is happening. Relying on regularity alone will deceive you. The indented bracketed content is called a "trailer" because it really is just trailing content from the first line. Black's formatting lets you visually navigate from the indented block to the preceding outer line fast.

Also, we won't be adding any form of formatting that involves backslashes since that piece of Python grammar should be dropped (it's the single place besides multiline strings that breaks significant indentation).

All that being said, we will be definitely fixing some of the more egregious examples of irregularity that stem from unnecessary or missing optional parentheses in if and while tests. This is one case where more regularity is definitely helpful.

Thanks @e3krisztian and @kaste for your thoughtful input and sorry it took me so long to get back to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: style What do we want Blackened code to look like?
Projects
None yet
Development

No branches or pull requests

3 participants