Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Strict in Plutus Tx #5371

Merged
merged 38 commits into from
Sep 1, 2023
Merged

Use Strict in Plutus Tx #5371

merged 38 commits into from
Sep 1, 2023

Conversation

zliu41
Copy link
Member

@zliu41 zliu41 commented Jun 6, 2023

No description provided.

@zliu41 zliu41 marked this pull request as draft June 6, 2023 02:19
@zliu41 zliu41 marked this pull request as ready for review August 4, 2023 14:22
@zliu41 zliu41 requested review from michaelpj, effectfully and a team August 4, 2023 14:22
Copy link
Member Author

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for review. This led to cost reduction in general, and in some cases the savings are extraordinary (e.g., from 7... to 2...). There are also some cost increases but they are much milder.

plutus-tx-plugin/test/Plugin/Coverage/Spec.hs Outdated Show resolved Hide resolved
Copy link
Contributor

@michaelpj michaelpj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while looking through the PIR and spotted some oddities, mostly things we want to be sure that optimization is clearing up later.

There are a few things that seem actually wrong, but mostly this looks fine.

(c
True
let
!ls : List Bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now this is strict and not obviously pure, so we can't inline it, alas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not obviously pure

How come? Looks obviously pure to me: a lambda behind a type let.

And even if it wasn't pure, it's evaluated right after the let process, so we could still inline it. I guess teaching that the inliner might be non-trivial though. Particuarlly because the "right after" part may get changed in the course of optimization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little hard to read, but it's actually an application of the lambda inside the let. Now, if we inlined the type binding, we could then reduce the application and we would be in a better place...

plutus-tx-plugin/test/Budget/show.pir-readable.golden Outdated Show resolved Hide resolved
plutus-tx-plugin/test/Budget/show.pir-readable.golden Outdated Show resolved Hide resolved
plutus-tx-plugin/test/Budget/show.pir-readable.golden Outdated Show resolved Hide resolved
plutus-tx-plugin/test/IsData/deconstructData.pir.golden Outdated Show resolved Hide resolved
plutus-tx/plutus-tx.cabal Show resolved Hide resolved
plutus-tx/src/PlutusTx/Builtins.hs Outdated Show resolved Hide resolved
plutus-tx/src/PlutusTx/Builtins.hs Outdated Show resolved Hide resolved
plutus-tx/src/PlutusTx/IsData/Class.hs Outdated Show resolved Hide resolved
plutus-tx/src/PlutusTx/IsData/Instances.hs Outdated Show resolved Hide resolved
Copy link
Contributor

@effectfully effectfully left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my biggest issue with reviewing this PR is that I don't know what to expect when I'm reviewing changes to a golden file. It would be great to have dump a comment in each golden file specifying which optimizations were performed on the code in the file, because otherwise I have no idea if something isn't, say, inlined because of an issue or because the inliner wasn't run on the program in the first place.

120 1402 (8.6%) 8201088120 (82.0%) 510702 (3.6%)
130 1512 (9.2%) 8884555240 (88.8%) 553062 (4.0%)
140 1622 (9.9%) 9568022360 (95.7%) 595422 (4.3%)
150 1732 (10.6%) 10251489480 (102.5%) 637782 (4.6%)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything's slower?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything's slower?

Not everything, but things are a bit less slower now thanks to -fno-full-laziness!

120 1402 (8.6%) 24946946570 (249.5%) 515004 (3.7%)
130 1512 (9.2%) 27026011310 (270.3%) 557724 (4.0%)
140 1622 (9.9%) 29105076050 (291.1%) 600444 (4.3%)
150 1732 (10.6%) 31184140790 (311.8%) 643164 (4.6%)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here.

120 12434 (75.9%) 4374611330 (43.7%) 490704 (3.5%)
130 13464 (82.2%) 4739320550 (47.4%) 531424 (3.8%)
140 14494 (88.5%) 5104029770 (51.0%) 572144 (4.1%)
150 15524 (94.8%) 5468738990 (54.7%) 612864 (4.4%)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And everywhere so far.

120 21265 (129.8%) 8951320962 (89.5%) 6204546 (44.3%)
130 22992 (140.3%) 9696801862 (97.0%) 6719976 (48.0%)
140 24719 (150.9%) 10442282762 (104.4%) 7235406 (51.7%)
150 26446 (161.4%) 11187763662 (111.9%) 7750836 (55.4%)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still.

({cpu: 181259654
| mem: 594532})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still. And this is quite worse! 7.6% worse to be precise, that's quite a lot, imagine an "optimization" of the CEK machine making things so much worse.

@@ -6,6 +6,14 @@ in
let
data (AdditiveMonoid :: * -> *) a | AdditiveMonoid_match where
CConsAdditiveMonoid : (\a -> a -> a -> a) a -> a -> AdditiveMonoid a
!ls : List integer
= (let
a = List integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could make the optimizer look look through type lets. I do agree it's worth inlining this type though.

Comment on lines 125 to 130
force
(force
(foldr
(delay
(\x ->
x))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was puzzled seeing foldr id z fs, but apparently that is the same as foldr (.) id fs z. All those times I wrote the latter and not the former...

plutus-tx/src/PlutusTx/Builtins.hs Outdated Show resolved Hide resolved
plutus-tx/src/PlutusTx/Builtins.hs Outdated Show resolved Hide resolved
@zliu41
Copy link
Member Author

zliu41 commented Aug 7, 2023

Many tests cases became more expensive with b267848, which stopped exposing the unfolding of plc. Previously I think the unfolding of plc prevented them from being costed properly.

Overall, there are both increased and decreased costs, but the increase is always relatively mild, while the decrease is sometimes extraordinary.

This makes sense: by making things more strict, you sometimes evaluate something once, which does not need to be evaluated. On the other hand, by making things less strict, you sometimes evaluate something many times, which only needs to be evaluated once.

Copy link
Member Author

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that another reason for the higher cost when using Strict comes from the fact that strict bindings are less likely to be inlined, because many of them are not considered "effect safe".

When something is not inlined, it incurs the additional machine costs of VDelay, VLamAbs and VApply, even if it doesn't change the number of times it is evaluated. But again, this is relatively mild compared to what Strict can potentially save us.

plutus-tx/src/PlutusTx/AssocMap.hs Outdated Show resolved Hide resolved
plutus-tx-plugin/test/Plugin/Basic/ifOpt.pir.golden Outdated Show resolved Hide resolved
@michaelpj
Copy link
Contributor

Let me elaborate on my argument for why lazy patterns on function arguments shouldn't be able to do anything (and hence why we need to figure out WTF is actually happening).

AIUI, a bang pattern on a function argument means we get something like this (using some name shadowing because that's what we end up with also):

f !x = t
---> 
f x = case x@x of { _ -> t }
---> (PIR)
let f = \x -> let !x = x in t 
in ...
---> (inlining: the outer x is evaluated strictly, so the inliner should inline the let binding since it's a trivial binding to a strict variable)
let f = \x -> t
in ...

Presumably with Strict, a lazy pattern means that we don't do this, so we get

f x = t
---> (PIR)
let f = \x -> t

i.e. the same thing.

It cannot matter whether we inline f. We don't even need an example: inlining a function cannot make a difference to how strictly an argument is evaluated otherwise that is an unsoundness in the inliner!

So it seems to me that lazy function arguments cannot do anything... except possibly make GHC do something different.

This seems pretty important to me - we need to understand what the effect of these changes are in general.

@zliu41
Copy link
Member Author

zliu41 commented Aug 8, 2023

i.e. the same thing.

Ok, now I remember that in Plutus Tx, let x = rhs in body and (\x -> body) rhs isn't the same thing. The former is non-strict while the latter is strict. If this is what the implementation actually does, then you are right, but I'll double check. Perhaps it isn't what it does.

@michaelpj
Copy link
Contributor

Ok, now I remember that in Plutus Tx, let x = rhs in body and (\x -> body) rhs isn't the same thing. The former is non-strict while the latter is strict.

I don't think that matters here? The only let bindings were in the PIR-psuedocode.

Copy link
Member Author

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased and responded to some comments.

plutus-tx-plugin/test/Budget/show.pir-readable.golden Outdated Show resolved Hide resolved
plutus-tx/src/PlutusTx/AssocMap.hs Outdated Show resolved Hide resolved
plutus-tx/src/PlutusTx/Builtins/Internal.hs Show resolved Hide resolved
plutus-tx-plugin/plutus-tx-plugin.cabal Show resolved Hide resolved
plutus-tx-plugin/plutus-tx-plugin.cabal Show resolved Hide resolved
@zliu41
Copy link
Member Author

zliu41 commented Aug 23, 2023

@michaelpj @effectfully One of the main mysteries solved. Previously unless I did all of these three things: link1, link2, link3, I got the strange "no unfolding" error for PlutusTx.Ratio.$fFromDataRational_$s$fFromData(,)_$cfromBuiltinData, when compiling fromBuiltinData @Rational. I was looking for the unfolding in PlutusTx.IsData.Instances, but turns out it is in PlutusTx.Ratio. With Strict on, the unfolding of this function becomes a bit larger, and exceeds GHC's threshold. This is a function created by GHC as a result of -fspecialise (as hinted by the $s), so it doesn't have the INLINABLE pragma.

The fix is simple - adding any one of the following to PlutusTx.Ratio works: -fno-specialise, or -fexpose-all-unfoldings, or bumping -funfolding-creation-threshold.

Next up: figure out why some things don't work in GHC 9.6 🤷‍♂️

@zliu41
Copy link
Member Author

zliu41 commented Aug 30, 2023

Added

{- Note [Lazy patterns on function parameters]
Lazy patterns (~) on function parameters don't do anything if the function isn't inlined.
This is because function applications in Plutus Tx are strict, so when passing an argument
to a function, the argument will be evaluated immediately, regardless of whether the
corresponding function parameter is lazy or strict. Specifically, both f !x = body and
f ~x = body in Plutus Tx are compiled into let f = \x -> body in PIR.

It doesn't matter whether or not the function is inlined by the PIR inliner, since the PIR
inliner does not change the semantics of the program.

However, it does make a difference if the function is unconditionally inlined by GHC.
Consider the Plutus Tx code let f !x = t in f arg. Here arg will be immediately when
being passed to f (as it should). But now consider let f ~x = t in f arg. Here GHC
may unconditionally inline f, leading to t [x := arg] (recognize that the GHC inliner
also performs beta reduction, in addition to replacing f with its definition), where arg
may not be evaluated. GHC does so because it is a Haskell compiler and does not know that
Plutus Tx is a strict language.

Therefore, lazy patterns should generally be avoided, since its semantics depend on
what the GHC inliner does. That said, we can use it in some cases to our advantage,
such as in (&&) and (||). These logical operators would be much less useful if
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve
the desired effect, namely, the second parameter is only evaluated if needed.
-}

Copy link
Member Author

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've addressed all the concerns now (except doing profiling).

By the way, for some of the cost increases, their costs should have been higher before. Consider Nothing <*> _ = Nothing. The _ should always have been evaluated strictly, but, just like the || and && situation, when we didn't have Strict, GHC may unconditionally inline <*>, thereby avoiding evaluating _. This may very well be responsible for most of the cost increases, since it leads to lots of strict, dead bindings, e.g., #5371 (comment).

We can potentially do the same thing to <*> as what we did to || and &&, but since there is no dramatic cost increases (and there shouldn't be), I'd rather leave this PR as it is and do that separately.

plutus-benchmark/nofib/test/formulaBudget.budget.golden Outdated Show resolved Hide resolved
(nonrec)
(termbind
(strict)
(vardecl ds [ Maybe b ])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is from Nothing <*> _ = Nothing, and it is the same situation as || and &&. The second argument should always be strictly evaluated, but when we didn't have Strict, GHC may unconditionally inline <*> which avoided evaluating the second argument.

This can be quite bad, and may be a main contributor of some of the increased budgets 😐

[y, z, zz, x, t]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added:

-- TODO: the Cek log of this test case is currently incorrect as it doesn't preserve
-- the order of logging, which it should do given the conservative-optimisation flag.
-- Both GHC and ourselves are culprits in this instance, see
-- #5371 (comment)

plutus-tx/src/PlutusTx/AssocMap.hs Outdated Show resolved Hide resolved
, all
, mapThese
) where
Map,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not a fan of the ad-hoc reformatting. It makes it much harder to review diffs. If we're going to switch to using a formatter globally, then let's agree that and apply it one go. (and if we haven't agreed and "reformatting in PRs" is acceptable, then there is nothing stopping someone just blindly formatting it back later!)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to trigger CI. I should have used a different module.

We've agreed on using 2-space indentation, and our style guide has been updated, so the reformatting I did makes it conform to the style guide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that we agreed to that :| can we revisit just agreeing to fourmolu everything?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to agree to fourmolu. Fourmolu is one way to convert 4-space to 2-space. It's not a problem for me if someone prefers hand formatting the code.

I'm not aware of any other tool that people use to auto-format code, so it's unlikely that someone will blindly re-format it in a very different format.

This is because function applications in Plutus Tx are strict, so when passing an argument
to a function, the argument will be evaluated immediately, regardless of whether the
corresponding function parameter is lazy or strict. Specifically, both `f !x = body` and
`f ~x = body` in Plutus Tx are compiled into `let f = \x -> body` in PIR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is wrong. They are compiled differently.

f ~x = body
-->
f = \x -> body

f !x = body
-->
f = \x -> let !x = x in body

That's why there is a difference if GHC inlines them, otherwise there would be literally no difference!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But aren't these two equivalent? And the second term becomes the first term after inlining. That's the point of this paragraph - it should (although it sometimes doesn't because of GHC inlining) produce the same PIR eventually, whether function parameters are strict or lazy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do become the same thing eventually, but the relevant thing is what they become initially, since that's what lets us observe the difference!

Okay, maybe let me phrase this differently.

f ~x = body
---> (GHC Core)
f x = body
---> (PIR)
f = \x -> body

f !x = body
---> (GHC Core)
f x = case x@x of { _ -> body }
---> (PIR)
f = \x -> let x = x in body

The two PIR bindings have the same semantics. But if GHC inlines the function at the core stage, then we get the body of the second PIR binding at the call site, which does behave differently.

(I think we both agree with what's going on, I just think the explanation isn't quite clear for future people)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added the steps you laid out above in the Note.

plutus-tx/src/PlutusTx/Bool.hs Outdated Show resolved Hide resolved
Consider the Plutus Tx code `let f !x = t in f arg`. Here `arg` will be immediately when
being passed to `f` (as it should). But now consider `let f ~x = t in f arg`. Here GHC
may unconditionally inline `f`, leading to `t [x := arg]` (recognize that the GHC inliner
also performs beta reduction, in addition to replacing `f` with its definition), where `arg`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessary for GHC to perform beta-reduction, although perhaps it does, because of the difference in how lazy/strict patterns get compiled above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is! I keep forgetting this! Maybe needs a more worked out example since apparently it's not obvious lol

Therefore, lazy patterns should generally be avoided, since its semantics depend on
what the GHC inliner does. That said, we can use it in some cases to our advantage,
such as in `(&&)` and `(||)`. These logical operators would be much less useful if
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve
they can't short-circuit. By using a lazy pattern on the second parameter and forcing GHC to inline, we can achieve

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this. The INLINE pragma does not force GHC to inline. We run the GHC simplifier with sm_inline turned off. It does force the PIR inliner to inline, but the PIR inliner only replaces the variable with the definition, which isn't sufficient to make it non-strict. We should make it so that the INLINE pragma forces the PIR simplifier to beta-reduce. We had a length discussion in the ADR PR and IIRC this was the conclusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right yes, we need to both force inlining and beta reduction. I keep forgetting that!

We had a length discussion in the ADR PR and IIRC this was the conclusion.

I do not think we agreed to that :p #5270 (comment)

I'm overall then confused by this line: since we don't have such behaviour at the moment, we can't achieve what's written here? So the note is talking about some hypothetical future behaviour. I think we should be clear that it's not something we can do now!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I rephrase it in an attempt to make it more clear:

Using a lazy pattern on the second parameter helps achieve
the desirable effect, namely, the second parameter is only evaluated if needed, although
this is currently not guaranteed, since (&&) and (||) are not necessarily
unconditionally inlined by GHC. To guarantee that the second parameter is only evaluated
when needed, we can potentially use a particular source annotation to tell the PIR
inliner to inline and beta-reduce a function, which would achieve the same effect as if
it is unconditionally inlined by GHC. This is yet to be implemented.

plutus-tx/src/PlutusTx/Builtins.hs Outdated Show resolved Hide resolved
plutus-benchmark/nofib/test/formulaBudget.budget.golden Outdated Show resolved Hide resolved
Copy link
Member Author

@zliu41 zliu41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion re: -fno-full-laziness. I added a Note [-fno-full-laziness in Plutus Tx].
This should now be mergeable.

120 1402 (8.6%) 8201088120 (82.0%) 510702 (3.6%)
130 1512 (9.2%) 8884555240 (88.8%) 553062 (4.0%)
140 1622 (9.9%) 9568022360 (95.7%) 595422 (4.3%)
150 1732 (10.6%) 10251489480 (102.5%) 637782 (4.6%)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything's slower?

Not everything, but things are a bit less slower now thanks to -fno-full-laziness!

, all
, mapThese
) where
Map,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to agree to fourmolu. Fourmolu is one way to convert 4-space to 2-space. It's not a problem for me if someone prefers hand formatting the code.

I'm not aware of any other tool that people use to auto-format code, so it's unlikely that someone will blindly re-format it in a very different format.

Therefore, lazy patterns should generally be avoided, since its semantics depend on
what the GHC inliner does. That said, we can use it in some cases to our advantage,
such as in `(&&)` and `(||)`. These logical operators would be much less useful if
they can't short-circuit. By using a lazy pattern on the second parameter, we can achieve
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I rephrase it in an attempt to make it more clear:

Using a lazy pattern on the second parameter helps achieve
the desirable effect, namely, the second parameter is only evaluated if needed, although
this is currently not guaranteed, since (&&) and (||) are not necessarily
unconditionally inlined by GHC. To guarantee that the second parameter is only evaluated
when needed, we can potentially use a particular source annotation to tell the PIR
inliner to inline and beta-reduce a function, which would achieve the same effect as if
it is unconditionally inlined by GHC. This is yet to be implemented.

This is because function applications in Plutus Tx are strict, so when passing an argument
to a function, the argument will be evaluated immediately, regardless of whether the
corresponding function parameter is lazy or strict. Specifically, both `f !x = body` and
`f ~x = body` in Plutus Tx are compiled into `let f = \x -> body` in PIR.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I added the steps you laid out above in the Note.

@zliu41 zliu41 merged commit bbc4a0d into master Sep 1, 2023
7 checks passed
@zliu41 zliu41 deleted the zliu41/strict branch September 1, 2023 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants