Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make type checking true read-only visitor #4829

Merged
merged 18 commits into from
Sep 4, 2024
Merged

Conversation

asl
Copy link
Contributor

@asl asl commented Jul 23, 2024

TypeChecking / TypeInference is the pass that is executed all the time during the compilation:

  • After each program tranformation to update type map
  • As nested pass on subtrees during MethodInstance evaluation
  • In some other cases, e.g. during static asserts evaluations

I counted on some downstream program that top-level (on P4Program node) TypeInference is called > 70 times. The nested invocations are much more numerous. Only in few cases (~2 out of these 70) is does some transformations, e.g. insert casts.

The real issue is that TypeInference is a Transform: so it does cloning of the IR even in so-called "read-only mode" and nothing really changes.

This PR restructures the code in such a way, that base functionality could be called from both Inspector and Transform. The only downside is that for Transform case we might have double-cloning, but this seems to cause negligible impact.

As a result, the functionality is split into:

  • ReadOnlyTypeInference which is an Inspector. It is used for TypeCheck, inside MethodInstance::resolve and read-write TypeInference learner
  • TypeInference that is a Transform. For debugging purposes I left the readOnly interface here as the cost to support it is essentially free.

Overall we are saving:

  • no clone for each node during type checking and use of TypeInference in read-only matter
  • as a result, much less malloc traffic
  • saves on internal Transform structures and much less complicated logic

Fixes #4815

@asl asl added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Jul 23, 2024
@asl asl requested review from vlstill and ChrisDodd July 23, 2024 21:46
@asl
Copy link
Contributor Author

asl commented Jul 23, 2024

Removing of overheads seem to yield 5-10% improvements in runtime (I have not measured memory pressure, but it is clear it would be less):

Command Mean [s] Min [s] Max [s] Relative
test/gtestp4c-main --gtest_filter=P4CParserUnroll.switch_20160512 4.860 ± 0.115 4.653 5.064 1.07 ± 0.04
test/gtestp4c --gtest_filter=P4CParserUnroll.switch_20160512 4.534 ± 0.108 4.291 4.667 1.00

And similar times for downstream projects:

Command Mean [s] Min [s] Max [s] Relative
main 58.350 ± 1.484 57.078 61.983 1.07 ± 0.03
this PR 54.343 ± 0.723 53.538 56.008 1.00

@asl
Copy link
Contributor Author

asl commented Jul 23, 2024

Certainly the better solution would be making Transform do lazy cloning, but it is a terribly invasive change effecting everything :)

@asl
Copy link
Contributor Author

asl commented Jul 23, 2024

I have not yet checked why, but it seems we used to emit errors twice. E.g. testdata/p4_16_errors_outputs/issue3727.p4-stderr contains:

issue3727.p4(7): [--Werror=type-error] error: f2(1) is not invoking an action
    actions = {f2(1);}
               ^^^^
issue3727.p4(7): [--Werror=type-error] error: f2(1) is not invoking an action
    actions = {f2(1);}
               ^^^^
[--Werror=type-error] error: Error while analyzing t

now we produce:

p4c/testdata/p4_16_errors/issue3727.p4(7): [--Werror=type-error] error: f2(1) is not invoking an action
    actions = {f2(1);}
               ^^^^

@asl
Copy link
Contributor Author

asl commented Jul 24, 2024

I have not yet checked why, but it seems we used to emit errors twice. E.g. testdata/p4_16_errors_outputs/issue3727.p4-stderr contains:

Ok, looks like there was a subtle bug within error reporter that is now simply not being triggered. The example code in question is:

        if (getContext()->node->is<IR::ActionListElement>()) {
            typeError("%1% is not invoking an action", expression);
            return expression;
        }

Before expression was IR::MethodCallExpression * and now it is const IR::MethodCallExpression *. Eventually typeError call ends in diagnose within error_reporter.h that has the following set of overloads:

    template <class T, typename = std::enable_if_t<Util::has_SourceInfo_v<T>>, typename... Args>
    void diagnose(DiagnosticAction action, const int errorCode, const char *format,
                  const char *suffix, const T *node, Args &&...args) {
       ...
    }

    template <class T, typename = std::enable_if_t<Util::has_SourceInfo_v<T>>, typename... Args>
    void diagnose(DiagnosticAction action, const int errorCode, const char *format,
                  const char *suffix, const T &node, Args &&...args) {
     ...
    }

    template <typename... Args>
    void diagnose(DiagnosticAction action, const char *diagnosticName, const char *format,
                  const char *suffix, Args &&...args) {
     ...
    }

When we pass non-const Node* pointer to these set of overloads, the third one is selected (as among first two the second one is selected essentially accepting a Node* by const reference, but then having it SFINA-disabled as Node* cannot have that has_SourceInfo() check succeeded with T = Node*), therefore all checks for duplicate emission that is based on SourceInfo are bypassed.

So, in this PR we started to pass const IR::MethodCallExpression * and therefore the different overload was selected, as a result, no duplicate diagnostics was emitted.

Frankly speaking, I do not understand why we're having that accept-by-reference overload at all, as only nodes are expected to have source info and all nodes are heap-allocated objects. I think we'd just remove it.

@asl asl added the run-sanitizer Use this tag to run a Clang+Sanitzers CI run. label Jul 24, 2024
@fruffy fruffy requested a review from grg July 24, 2024 13:27
@fruffy
Copy link
Collaborator

fruffy commented Jul 24, 2024

Certainly the better solution would be making Transform do lazy cloning, but it is a terribly invasive change effecting everything :)

Why not duplicate Transform, modify the duplicated version, then use it only for TypeInference? If that works we can also use the modified version in other places.

A further complication is that, by design, many IR objects have const pointers which requires a copy.

@asl
Copy link
Contributor Author

asl commented Jul 24, 2024

Why not duplicate Transform, modify the duplicated version, then use it only for TypeInference? If that works we can also use the modified version in other places.

Sorry, I do not follow. what do you mean as "duplicate"? Are you suggesting making a copy of 180k SLOC of implementation?

@fruffy
Copy link
Collaborator

fruffy commented Jul 24, 2024

Why not duplicate Transform, modify the duplicated version, then use it only for TypeInference? If that works we can also use the modified version in other places.

Sorry, I do not follow. what do you mean as "duplicate"? Are you suggesting making a copy of 180k SLOC of implementation?

On the note of

The real issue is that TypeInference is a Transform: so it does cloning of the IR even in so-called "read-only mode" and nothing really changes.

Can you implement another version of Transform that does lazy cloning which is then used by TypeInference?

Do you also need to change the IR implementation or many parts of TypeInference?

@grg
Copy link
Contributor

grg commented Jul 24, 2024

Can you implement another version of Transform that does lazy cloning which is then used by TypeInference?

LazyTransform! 🙂

@asl
Copy link
Contributor Author

asl commented Jul 24, 2024

Do you also need to change the IR implementation or many parts of TypeInference?

The IR is the same. Few things of TypeInference were changed not to modify things in-place, but rather return them as a result of preorder / postorder. See the commits in this PR for each of these cases, these changes were made prior to refactoring.

Can you implement another version of Transform that does lazy cloning which is then used by TypeInference?

Well, this would require some rethought on how IR is created / modified. Note that for lazy cloning we'd need to:

  • "Inform" transform that change was made
  • Propagate changes up to the context (might be tricky, especially in postorder)

These changes drastically the overall visitor concept, currently lots of things expect that everything is "fresh clone" and "there is also original node somewhere". LazyTransform would change this semantics heavily. I have some thoughts how this might be implemented, but it is much larger and complicated change rather than factoring out the generic functionality into a common base class and inherit the implementation depends on the intent.

DEFINE_PREORDER(IR::EntriesList)
DEFINE_PREORDER(IR::Type_SerEnum)

#undef DEFINE_PREORDER
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated on line 162

Comment on lines +424 to +407
/*
The type of the member is initially set in the Type_SerEnum preorder visitor.
Here we check additional constraints and we may correct the member.
if (done()) return member;
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inconsistent with other comment styles in the file -- should it be double-slash?

Copy link
Contributor Author

@asl asl Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be. However, this was the original comment. I just moved the function to a different file (see https://github.com/p4lang/p4c/pull/4829/files#diff-316382055471bab8faf77716e614a5508e70e44c56e52096f08ff5ebe6ee7e9dL355). Also note that the original code had if (done()) return member shortcut commented out. I have not dig into the logic why it was so required.

@grg
Copy link
Contributor

grg commented Jul 24, 2024

The only downside is that for Transform case we might have double-cloning, but this seems to cause negligible impact.

Verifying that I correctly understand the reason for this:

  • To support the inspect read-only (Inspector) pass, the pointers that are passed to the TypeInferenceBase class are const IR::<Type> * instead of IR::<Type> *. In order to modify the object being pointed at, we need to clone it first.
  • Now when running the non-read-only (Transform) pass, Transform::apply_visitor clones each node before visiting, and TypeInferenceBase then creates a second clone.

To avoid the second copy, would either of these be a viable solution:

  1. Have both const and non-const pointer versions of all TypeInferenceBase preorder/postorder functions that modify the object being passed in. Then ReadOnlyTypeInference can invoke the const pointer version and TypeInference can invoke the non-const pointer version. My big concern with this approach is that it would introduce lots of code duplication. (If it was possible to identify early if cloning is required, the const version could do an up-front clone and invoke the non-const version of each function.)
  2. Add a boolean read_only field to TypeInferenceBase, and a "clone-or-unconst" function that clones the node when the read_only flag is true, and does a const_cast on the node if read-only is false to remove the constness. I avoid const_cast, but maybe this is one place where it is appropriate?

@grg
Copy link
Contributor

grg commented Jul 24, 2024

I should add that since you said that the double-clone seems to have negligible impact, it's probably not worth attempting either of the suggestions in my previous comment.

But can extend the TypeInferenceBase comment in typeChecker.h to explain that a double-clone is occurring in case someone is looking at this later on please?

@asl
Copy link
Contributor Author

asl commented Jul 24, 2024

I should add that since you said that the double-clone seems to have negligible impact, it's probably not worth attempting either of the suggestions in my previous comment.

The reason why I said the impact is negligible is the following:

  • Normal Transform also does double clone. First clone is done before preorder / postorder (so they receive pointer to non-const Node), second clone is done if preorder / postorder returns something that is is not the node that they received in
  • It turns out that when casts are inserted (non-read-only mode) only very few places (I count 5 or 6 node kinds) that modified thing in-place. All other cases created new nodes and returned them from preorder / postorder resulting in double-cloning. Essentially, I made it so no method changed things in-place. And these cases are rare (see first commits in this PR for this)
  • Few places that did manual visiting of their children in a specific order now do explicit clone when necessary.

And yes, the additional double-cloning only happens in very few TypeInference invocation where it inserts casts (~1 or 2 from 70+ total).

Hope this makes the things a bit more clear :)

@asl asl force-pushed the type-inference-visitor branch 2 times, most recently from 74172b3 to 1e3daeb Compare July 29, 2024 16:32
@ChrisDodd
Copy link
Contributor

Certainly the better solution would be making Transform do lazy cloning, but it is a terribly invasive change effecting everything :)

I've been trying to think of a way of doing this. One possibility would be to change the preorder/postorder methods to take and return a "smart" pointer type that holds both a const pointer to the original node and non-cost pointer to the clone that would initially be nullptr. The first modification of the node would clone it. Template magic would be involved to make this all work cleanly.

For incremental improvement, the way to do this sort of thing is to start by writing a new Visitor subclass replacement for Transform (maybe called LazyTransform?) and then experiment with rewriting existing Transform subclasses to use it. A big complication is how it interacts with Visitor::Context -- we need the smart pointers in there too, but perhaps that could be managed by careful updating.

@ChrisDodd
Copy link
Contributor

Frankly speaking, I do not understand why we're having that accept-by-reference overload at all, as only nodes are expected to have source info and all nodes are heap-allocated objects. I think we'd just remove it.

It's for dealing with inline fields -- where a IR::Node subclass appears directly as a field within another IR::Node subclass, rather than as a pointer. While we could require that using such a field in an error message requires an explicit & to get a pointer to it, that would be error prone.

@asl
Copy link
Contributor Author

asl commented Jul 30, 2024

For incremental improvement, the way to do this sort of thing is to start by writing a new Visitor subclass replacement for Transform (maybe called LazyTransform?) and then experiment with rewriting existing Transform subclasses to use it. A big complication is how it interacts with Visitor::Context -- we need the smart pointers in there too, but perhaps that could be managed by careful updating.

Another tricky part is various side maps: they may capture original const-node (e.g. in preorder), but in the case of any children change, the will need to update to cloned object....

@asl asl added the run-validation Use this tag to trigger a Validation CI run. label Sep 2, 2024
decl = cloned;
}
}
if (decl->initializer != nullptr) visit(decl->initializer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to check for nullptr here, as calling visit on a nullptr is a noop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... but this was the original code :)

@ChrisDodd
Copy link
Contributor

For incremental improvement, the way to do this sort of thing is to start by writing a new Visitor subclass replacement for Transform (maybe called LazyTransform?) and then experiment with rewriting existing Transform subclasses to use it. A big complication is how it interacts with Visitor::Context -- we need the smart pointers in there too, but perhaps that could be managed by careful updating.

Another tricky part is various side maps: they may capture original const-node (e.g. in preorder), but in the case of any children change, the will need to update to cloned object....

That's already tricky -- any Modifier?Transform that runs currently will invalidate any data structure outside the IR DAG that refers to IR nodes. Perhaps we need a way of registering such a data structure with the visitor infrastructure so that it can be automatically updated to reflect any clones that occur, but I'm not sure how that would best work. Some sort of callback to be called whenever a node is cloned (or replaced? Transform may completely replace a node with a newly created node that might not even be the same class)

Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
…pechecker as a Visitor with reduced overhead

Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
Signed-off-by: Anton Korobeynikov <anton@korobeynikov.info>
@asl asl added this pull request to the merge queue Sep 4, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 4, 2024
@asl asl added this pull request to the merge queue Sep 4, 2024
Merged via the queue into p4lang:main with commit 83fa5f3 Sep 4, 2024
18 checks passed
@asl asl deleted the type-inference-visitor branch September 4, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Topics concerning the core segments of the compiler (frontend, midend, parser) run-sanitizer Use this tag to run a Clang+Sanitzers CI run. run-validation Use this tag to trigger a Validation CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce type checking overhead
4 participants