Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent interfaces #67

Closed
sfermigier opened this issue Mar 23, 2018 · 48 comments
Closed

Fluent interfaces #67

sfermigier opened this issue Mar 23, 2018 · 48 comments
Labels
help wanted Extra attention is needed T: enhancement New feature or request

Comments

@sfermigier
Copy link

sfermigier commented Mar 23, 2018

Operating system: Mac OS
Python version: 3.6.4
Black version: 18.3a3
Does also happen on master: yes


Black removes trailing backlashes, which can be used, for instance with yapf, to signal "don't join these line".

This is problematic, for instance, if one uses the "fluent interface" idiom, which can lead to long chains of attribute accesses in a given expression.

Here's an example:

-        return sa.sql \
-            .select([sa.sql.func.count(membership.c.user_id)]) \
-            .where(membership.c.group_id == cls.id) \
-            .group_by(membership.c.group_id) \
-            .label('members_count')
+        return sa.sql.select([sa.sql.func.count(membership.c.user_id)]).where(
+            membership.c.group_id == cls.id
+        ).group_by(
+            membership.c.group_id
+        ).label(
+            'members_count'
+        )

The easiest solution would be to keep the line break in the presence of a backslash.

@sethmlarson
Copy link
Member

What is the result if you do this?

return (
    sa.sql
    .select([sa.sql.func.count(membership.c.user_id)])
    .where(membership.c.group_id == cls.id)
    .group_by(membership.c.group_id)
    .label('members_count')
)

@ambv
Copy link
Collaborator

ambv commented Mar 23, 2018

Black doesn't support fluent interfaces or any other hand-formatted idioms. Backslashes are always removed, I responded as to why in #64.

This will be solved by # fmt: off which I am working on now and will be part of the next release on Monday.

@ambv
Copy link
Collaborator

ambv commented Mar 23, 2018

And for the record, I personally find the Black-formatted version easier on the eyes. Leading dots are invalid syntax in Python without backslash magic.

@carljm
Copy link
Collaborator

carljm commented Mar 23, 2018

No plans to change Black's formatting approach here; we already have #5 for # fmt: off escape hatch.

@ambv
Copy link
Collaborator

ambv commented Mar 30, 2018

I will reconsider this. My current idea is as follows:

  • if the programmer explicitly wrapped a chain of at least two calls in parentheses (for return or assignment), then Black will leave those parentheses and do fluent interfaces;
  • regular formatting happens in all other situations.

By fluent interface I understand treating "." as a delimiter that comes first on the line. Is there anything else?

@ambv
Copy link
Collaborator

ambv commented Mar 30, 2018

@dstufft, @wbolster, note this is re-opened now.

@dstufft
Copy link

dstufft commented Mar 31, 2018

@ambv I had never heeard of it called fluent interface before, so I'm not sure :) I do stuff like this a lot in Warehouse though:

        release = (
            request.db.query(Release)
                      .filter(Release.project == project)
                      .order_by(
                          Release.is_prerelease.nullslast(),
                          Release._pypi_ordering.desc())
                      .limit(1)
                      .one()
        )
    project_names = [
        r[0] for r in (
            request.db.query(Project.name)
                   .order_by(Project.zscore.desc().nullslast(),
                             func.random())
                   .limit(5)
                   .all())
    ]
    release_a = aliased(
        Release,
        request.db.query(Release)
                  .distinct(Release.name)
                  .filter(Release.name.in_(project_names))
                  .order_by(Release.name,
                            Release.is_prerelease.nullslast(),
                            Release._pypi_ordering.desc())
                  .subquery(),
    )
    trending_projects = (
        request.db.query(release_a)
               .options(joinedload(release_a.project))
               .order_by(func.array_idx(project_names, release_a.name))
               .all()
    )

    latest_releases = (
        request.db.query(Release)
                  .options(joinedload(Release.project))
                  .order_by(Release.created.desc())
                  .limit(5)
                  .all()
    )

    counts = dict(
        request.db.query(RowCount.table_name, RowCount.count)
                  .filter(
                      RowCount.table_name.in_([
                          Project.__tablename__,
                          Release.__tablename__,
                          File.__tablename__,
                          User.__tablename__,
                      ]))
                  .all()
    )

I pretty much only do it with SQLAlchemy, but that's also the only thing I really use whose API is a long chain of methods like that. The reason I do it that way (and afaik prior to learning others did it, I didn't know it had a name!) was because it matches the way I would format the SQL the most and groups the important aspects together on one line (or a few lines), so like taking a look at:

        release = (
            request.db.query(Release)
                      # The "Filter by Release.project" is one logical "thought"
                      .filter(Release.project == project)
                      # The Order by is another logical though, but is too long
                      # for one line, so it has to get split.
                      .order_by(
                          Release.is_prerelease.nullslast(),
                          Release._pypi_ordering.desc())
                      # again, a single logical though, which isn't really modified at all by
                      # the previous lines, they could be deleted/added and this would
                      # stay the same.
                      .limit(1)
                     # and again, one logical thought.
                      .one()
        )

The black way of formatting that, which is:

if True:
    if True:
        release = (
            request.db.query(Release).filter(Release.project == project).order_by(
                Release.is_prerelease.nullslast(), Release._pypi_ordering.desc()
            ).limit(
                1
            ).one()
        )

The individual lines end up doing more than one logical thing, and you end up having to be much more careful about having to read through it to actually understand what's going on, because it each line ends up doing multiple things (the . are almost acting like ; here, conceptually).

Anyways, I'm glad you're reconsidering this. The conditions as described sounds like it would work fine for me.

@dstufft
Copy link

dstufft commented Mar 31, 2018

Oh just for completness sake, heres the Python code and the SQL that it's mimic'ing:

request.db.query(Release)
          .filter(Release.project == project)
          .order_by(
              Release.is_prerelease.nullslast(),
              Release._pypi_ordering.desc())
          .limit(1)
          .one()
SELECT * FROM releases
WHERE releases.project = 'foo'
ORDER BY
    releases.is_prerelease NULLS LAST,
    releases._pypi_ordering DESC
LIMIT 1

That SQL isn't completely valid, but you can get the idea?

@ambv ambv added T: enhancement New feature or request help wanted Extra attention is needed labels Mar 31, 2018
@sfermigier
Copy link
Author

@ambv Thanks.

@ambv & @dstufft : Fluent interfaces have been originally discussed by Martin Fowler in 2005 in: https://martinfowler.com/bliki/FluentInterface.html and Eric Evans in the "Domain Driven Design" (blue) book.

There are some discussions in https://en.wikipedia.org/wiki/Fluent_interface , more specifically: https://en.wikipedia.org/wiki/Fluent_interface#Python though not very deep.

In the Python context, I have seen them used mostly for creating DSLs (as stated in the top of the Wikipedia article), more specifically for ORM and ORM-like libraries and frameworks (SQLAlchemy, others).

My and @dstufft 's examples give a pretty good idea of what is considered "canonical" code when building queries with the SQLAlchemy DSL.

Regarding the Python standard library, the only cas I can think of strings substitutions. Here's an example where Black reformats Flask's code in a less than savoury way:

-    rv = dumps(obj, **kwargs) \
-        .replace(u'<', u'\\u003c') \
-        .replace(u'>', u'\\u003e') \
-        .replace(u'&', u'\\u0026') \
-        .replace(u"'", u'\\u0027')
+    rv = dumps(obj, **kwargs).replace(u'<', u'\\u003c').replace(
+        u'>', u'\\u003e'
+    ).replace(
+        u'&', u'\\u0026'
+    ).replace(
+        u"'", u'\\u0027'
+    )

Regarding how to deal with it: IMHO the best way would be to consider that code with trailing backslashes has already been crafted manually by the programmer, and don't change it. This is, more or less, the current behaviour of Yapf or PyCharm's fomatter.

@ambv It you still don't agree with this idea, what you propose seems to make sense.

I note, however, that programmers will have to explicitly add parenthesis about long fluent expressions (as well as non-fluent one, like complex boolean expressions with no function calls) in order for Black to split the lines and not create a pep8 violation.

@ambv
Copy link
Collaborator

ambv commented Mar 31, 2018

Backslashes are bad form. Organizational parentheses are much preferred.

@dstufft
Copy link

dstufft commented Mar 31, 2018

I generally don't use backslashes for these, because multiple lines of backslashes always seemed off to me, parentheses seemed to always make the most sense to me.

@wbolster
Copy link
Contributor

some thoughts.

the pattern is a.b.c.d(...).e(..., ...).

breaking it into pieces that should (preferably) end up on "logical lines":

  • a.b.c
  • .d(...)
  • .e(..., ...)

each of these could be one line, or more than one if the line becomes too long. when wrapping, i think normal black rules could apply, with the exception that the closing paren should not go on a line of its own.

re. indenting of the whole statement, multiple-of-four indents should be required, just like elsewhere. that can.be easily achieved like this:

query = (
    a.b.c
    .d()
    .e(
        ...,
        ...))

the final closing paren is debatable, and could go on a new line.

@ambv
Copy link
Collaborator

ambv commented Mar 31, 2018

Final closing parens are always dedented and on a separate line in Black.

@dstufft
Copy link

dstufft commented Mar 31, 2018

For the final closing paren, I've switched back and forth between new line and inline with the final line, I think either works fine.

I think the only tweak I'd make here is I'd align the . withe final . in the first line, so you end up with:

query = (
    a.b.c
       .d()
       .e(
           ...,
           ...
       )
)

This actually ends up having the dot leading lines at not a multiple of 4, but in my experience, these things tend to be "normal" python attribute access, until you get to the chain of methods portion, and I find it easier to read down the order of operations if they're all aligned. I think this is more obvious with the example I posted above:

release = (
    request.db.query(Release)
              .filter(Release.project == project)
              .order_by(
                Release.is_prerelease.nullslast(),
                Release._pypi_ordering.desc())
              .limit(1)
              .one()
)

Here request.db isn't really part of the fluent interface portion, it's just the leading portion you access to get to where the object implementing the fluent interface lives, but once you get to the fluent interface, you want it aligned so that you can generally read it straight down. I think that reads better than:

release = (
    request.db.query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc())
    .limit(1)
    .one()
)

because it still provides some visual distinction that indicates there is a reason why these lines are starting with . and keeps the flow so you sort of scan it straight down in one logical block.

That being said, something like:

release = (
    request.db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc())
    .limit(1)
    .one()
)

Isn't the worst thing ever, you just basically have to mentally ignore the first line completely when parsing the fluent interface portion. I think it's worse than aligning with the . as in my first "real" code sample in this post.

Although I guess we'd have the middle set of parens on it's own line too, so it would end up looking like:

release = (
    request.db.query(Release)
              .filter(Release.project == project)
              .order_by(
                Release.is_prerelease.nullslast(),
                Release._pypi_ordering.desc()
              )
              .limit(1)
              .one()
)

If you do the "align with the . and not try to make perfect 4 space indent", you have to decide how you indent sub clauses (the arguments to order_by( ... ) in my example). My editor when I hit tab just aligns them with the next 4 space indent, so you end up with (in my example) 14 spaces prior to the .order_by(, then 18 spaces prior to the Release.is_prerelease.nullslast(),, which "corrects" the indentation so that additional nested function calls end up with the "correct" indentation.

The downside to this, is you end with two "partial" indents until you're back on always multiples of 4. I think this is OK because I think it's far more readable.

Another option is:

release = (
    request.db.query(Release)
              .filter(Release.project == project)
              .order_by(
                    Release.is_prerelease.nullslast(),
                    Release._pypi_ordering.desc()
              )
              .limit(1)
              .one()
)

Which puts 20 spaces before the Release.is_prerelease.nullslast(),, which means that there is a little extra space between it and the .order_by, but it still reads perfectly fine, and then the only lines that don't have an exact multiple of 4 for indents comes the lines that start with . (assuming you do the align like I suggested).

@wbolster
Copy link
Contributor

indents >4 are avoided by black, and personally i mostly agree with that (wastes space and ugly), but in any case i don't see why fluent apis should be an exception.

@ambv
Copy link
Collaborator

ambv commented Mar 31, 2018

Black never aligns with anything while indenting. It will indent four characters within any block or bracket pair and that's it. This always gives us maximum space available for the line content. Also, I feel that being consistent is more important than hand-optimizing particular edge cases.

The simplest implementation will be to simply treat the dot as a delimiter (like "and", "or", etc.). I don't particularly like hand-optimizing leading attribute accesses because attribute accesses might just as well happen later, eg.:

result = (
    request
    .db
    .query(Release)
    .filter(Release.project == project)
    .some_attribute  # note
    .limit(1)
    .one()
    .release_name  # note
)

A reverse situation might also be possible where the first line is a call, too:

result = (
    Request(with='', some='', data='')
    .ditto
    .ditto()
    .one()
)

I'm inclined to think that if somebody really hates wasting lines on the leading attribute accesses, assigning those to a name is the way to go.

Oh, one more thing. If the bracket contents fit in one line, they will (like everywhere else in Black):

result = (
    request.db.query(Release).limit(1).one().release_name
)

And to be clear, if the entire thing fits in one line, it will.

result = request.db.query(Release).limit(1).one().release_name

Summing up, I want Black's implementation of fluent interfaces to be as trivial and consistent as the rest of the formatting it applies.

@dstufft
Copy link

dstufft commented Mar 31, 2018

I was playing around with it, and I think something like:

release = (
    request.db
        .query(Release)
        .filter(Release.project == project)
        .order_by(
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        )
        .limit(1)
        .one()
)

You still end up with an extra level of indentation, but I think that ends up more readable, 🤷‍♂️

That being said, I find all of these examples more readable than what black does now so any of them are an improvement. I'm also perfectly fine with fitting things onto one line when possible (which is what I generally do already).

@wbolster
Copy link
Contributor

@ambv the only "special" (quite common in practice) case would be leading attribute accesses. preferring a single line for this seems worth it. when mixing calls and attr access, one per line indeed, as you suggest.

@kalekseev
Copy link
Contributor

I would also vote for one level indented version

release = (
    request
        .db
        .query(Release)
        .filter(Release.project == project)
        .order_by(
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        )
        .limit(1)
        .one()
)

rather than

release = (
    request
    .db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)

@wbolster
Copy link
Contributor

wbolster commented Apr 3, 2018

anything but a 4-space indent would be inconsistent with how black works elsewhere, and would be in direct conflict with black's consistency goals. @ambv was very clear about that in an earlier comment:

Black never aligns with anything while indenting. It will indent four characters within any block or bracket pair and that's it. This always gives us maximum space available for the line content. Also, I feel that being consistent is more important than hand-optimizing particular edge cases.

that said, i do advocate for a sole exception which is not related to indentation, but to line breaking. prefer single lines for attribute accesses at the start of the chain (see my my earlier comment).

black's (upcoming) support for ‘fluent apis’ seems only about line breaks (not about indentation), and i think it's reasonable to expect black to acknowledge that both ‘continuation lines are special’ and ‘leading attribute accesses are special’ since that is the predominant (only?) style for fluent apis, e.g. sqlalchemy + web frameworks.

the example would then become:

release = (
    request.db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)

@kalekseev
Copy link
Contributor

@wbolster consistency is good but I believe that we should consider user experience too, otherwise we can consistently produce code that majority of people don't like and never get into wide adoption in the community.

@wbolster
Copy link
Contributor

wbolster commented Apr 3, 2018

black always uses 4 spaces, and that's not considered a ‘user experience’ problem for normal code, so what exactly is the ‘user experience’ problem with 4 space indents for fluent apis, as shown here?

@kalekseev
Copy link
Contributor

@wbolster for example I like to see entry level separated from the body because it helps quickly scan the beginning of chain and skip whole body if I'm not interested in it.
Example

Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()
(
    request
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)
Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()

vs

Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()
(
    request
        .query(Release)
        .filter(Release.project == project)
        .order_by(
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        )
        .limit(1)
        .one()
)
Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()

Although I agree that in python we have parents and it helps to parse the chain as one sentence for me second version looks more readable but maybe I'm polluted by javascript code style :)

@dstufft
Copy link

dstufft commented May 16, 2018

For whatever it's worth, while I have preferences, I think either option is good enough that I'm happy enough with either.

@wbolster
Copy link
Contributor

wbolster commented May 16, 2018

what @dstufft said. i suggested both because both are valid choices, and i do not have a strong preference myself.

anything except this monstrosity, basically: 😉

x = (
    db
    .session
    .query(...)
    .all()
)

@jarshwah
Copy link

hah, I would be ok with that version too @wbolster :) but any of the 3 proposed formats I'd be fine with.

@ambv which one are you going with out of interest? I wasn't able to determine which from comments above.

@dstufft
Copy link

dstufft commented May 16, 2018

Yea to be clear, I'm fine with that too, I'd probably end up writing my code slightly different, but that's about all.

@ambv
Copy link
Collaborator

ambv commented May 16, 2018

FYI when I actually went and implemented it, it turned out that singling out attributes always looks weird. So the new rule is: fluent interfaces only break calls and indexing. Example from the tests:

def example(session):
    result = (
        session.query(models.Customer.id)
        .filter(
            models.Customer.account_id == account_id,
            models.Customer.email == email_address,
        )
        .order_by(models.Customer.id.asc())
        .all()
    )

I think this is what everybody wanted. Note that the first call is hugged to the attribute but this is necessary for all those cases where you just have one call but multiple attributes along the way. Consider:

    traceback.TracebackException.from_exception(
        exc,
        limit=limit,
        lookup_lines=lookup_lines,
        capture_locals=capture_locals,
        # copy the set of _seen exceptions so that duplicates
        # shared between sub-exceptions are not omitted
        _seen=set(_seen),
    )

It would look very weird to break up the call from the attributes.

@dstufft
Copy link

dstufft commented May 16, 2018

That looks excellent :)

@ambv ambv closed this as completed in 8c74d79 May 16, 2018
@wbolster
Copy link
Contributor

thanks @ambv!

@whardier
Copy link

Be cool if black could respect parens when only one dot is present (not minimum of two). I might be missing something in the docs. I respect the non-compromising aspect of the project - I just feel like I'm missing something.

@randolf-scholz
Copy link

Imo, one needs to rethink the hugging of the first attribute. For example out of this

observations = (
    measurements_experiments  # original dataframe
    .join(measurements_bioreactor, on=["measurement_time"])  # add bioreactor data
    .join(setpoints["InducerConcentration"].dropna(), how="outer")  # add setpoints
)

black makes this ugly thing:

observations = measurements_experiments.join(  # original dataframe
    measurements_bioreactor, on=["measurement_time"]
).join(  # add bioreactor data
    setpoints["InducerConcentration"].dropna(), how="outer"
)  # add setpoints

I think it would make sense to special case:

  1. If there is a single .method(..) or .attr[...] call: hug it
  2. If there are more than one .method(..) or .attr[...] call: multi-line it

This way, the example from #67 (comment) would become

def example(session, models):
    result = (
        session
        .query(models.Customer.id)
        .filter(
            models.Customer.account_id == account_id,
            models.Customer.email == email_address,
        )
        .order_by(models.Customer.id.asc())
        .all()
    )

Traceback would stay the same:

traceback.TracebackException.from_exception(
    exc,
    limit=limit,
    lookup_lines=lookup_lines,
    capture_locals=capture_locals,
    # copy the set of _seen exceptions so that duplicates
    # shared between sub-exceptions are not omitted
    _seen=set(_seen),
)

@YodaEmbedding
Copy link

YodaEmbedding commented May 31, 2023

traceback.TracebackException.from_exception(...)

The TracebackException does not have any () attached to it, so it's reasonable to treat it as a part of traceback..

This can be done by having different rules for attributes without a call [such as .TracebackException], and attributes with a call [such as .from_exception()].

@eguiraud
Copy link

Hello, I was surprised to find that this code:

obj = obj.filter("oooooooooooooooooooooooooooooooooooo").filter("oooooooooooooooooooooooooooooo")

is formatted to the awkward:

obj = obj.filter("oooooooooooooooooooooooooooooooooooo").filter(
    "oooooooooooooooooooooooooooooo"
)

while if I add a third chained call:

obj = obj.filter("oooooooooooooooooooooooooooooooooooo").filter("oooooooooooooooooooooooooooooo").filter("oooooooooooooooooooooooooooooooooo")

black makes it pretty:

obj = (
    obj.filter("oooooooooooooooooooooooooooooooooooo")
    .filter("oooooooooooooooooooooooooooooo")
    .filter("oooooooooooooooooooooooooooooooooo")
)

I could not find a way to convince black to reformat something like the first snippet to something like the last. What am I missing?

@sfermigier
Copy link
Author

@eguiraud I guess it's the normal current behaviour.

I don't see any occurrence where the result you call "awkward" (because it mixes two diffent styles in the same statement) could be considered the correct answer, but I may missing something.

I would treat it as a feature request or a bug.

@eguiraud
Copy link

A previous comment seems to indicate that one needs at least two dots for black to treat the call chain as a fluent API, although the docs don't mention this.

With the example above I was kind of hoping to show that the current behavior when there is only one dot is not ideal. But ok, I should probably open a new issue at this point :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed T: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests