Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy Diagnostic storage allocation #10852

Merged
merged 22 commits into from
Aug 23, 2024

Conversation

4e6
Copy link
Contributor

@4e6 4e6 commented Aug 20, 2024

Pull Request Description

close #10727

Changelog:

  • add: implement lazy allocation of DiagnosticStorage objects
  • update: compiler phases to make sure they don't allocate redundant DiagnosticStorage objects
  • update: cleanup DiagnosticStorage class

Important Notes

As a result, all 10MB of redundant DiagnosticStorage allocations are gone

Before

2024-08-21-170148_2092x171_scrot

After

2024-08-21-170058_1693x131_scrot

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

  • The documentation has been updated, if necessary.
  • All code follows the
    Scala,
    Java,
    TypeScript,
    and
    Rust
    style guides. In case you are using a language not listed above, follow the Rust style guide.
  • Unit tests have been written where possible.

@4e6 4e6 added the CI: No changelog needed Do not require a changelog entry for this PR. label Aug 20, 2024
@4e6 4e6 self-assigned this Aug 20, 2024
@4e6 4e6 force-pushed the wip/db/10727-diagnosticstorage-allocated branch from 444312f to 76da204 Compare August 21, 2024 13:57
@4e6 4e6 marked this pull request as ready for review August 22, 2024 10:08
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really happy to see four diagnostics related methods in IR. I'd like to hear some justification before I even try to look at the massive changes around. Can you somehow update Javadoc of the diagnostics related methods in IR to describe why there have to be four and what's different among each other?

*
* @return the diagnostic storage of this node
*/
DiagnosticStorage getDiagnostics();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already is diagnostics(). Why do we need another getDiagnostics()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can return null. I updated the docs

*
* @return the copy of diagnostic storage
*/
DiagnosticStorage diagnosticsCopy();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just call diagnostics().copy()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes more sense if you put it into the context of LazyDiagnosticStorage, which deal with the possibility of null.
Although I would probably already provide a default implementation here:

DiagnosticStorage diagnosticCopy() {
  if (diagnostics == null) null else diagnostics.copy();
}

Also I would maybe rename it to diagnosticsLazyCopy

*
* @return the list of diagnostics
*/
List<Diagnostic> diagnosticsList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we call diagnostics().toList()?

_diagnostics = diagnostics
}

override def diagnostics: DiagnosticStorage = {
Copy link
Member

@JaroslavTulach JaroslavTulach Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not really sure what _diagnostics: DiagnosticStorage = _, but if I parse the ideas in this file, I would really prefer to get only:

interface IR  {
  DiagnosticsStorage diagnostics(boolean canReturnNull);
}

the implementation in LazyDiagnosticsStorage would be:

trait LazyDiagnosticStorage { self: IR =>
  private[this] var d: DiagnosticsStorage = null;
  override synchronized def diagnostics(boolean canReturnNull) = {
    if (!canReturnNull && d == null) {
      d = DiagnosticsStorage();
   }
   d
  }
}

That's all we need to defer allocation of DiagnosticsStorage until needed, in my opinion. The boilerplate of diagnosticsList, getDiagnostics, diagnosticsCopy can be moved elsewhere or preferably disappear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_diagnostics: DiagnosticStorage = _

represents the actual allocated DiagnosticStorage and _ means null in this context. You call it d he calls it _diagnostics which is more in line with Scala's way of defining custom setters/getters. I think this is fine.
Also you would have to canReturnNull all over the place and I don't think this is any better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this approach

ir.diagnostics(canReturnNull = true).copy()

can throw NPE, that's why there is a to DRY the null checking logic

ir.diagnosticsCopy()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can remove the extra methods from the IR interface and make them more implementation-specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this approach

ir.diagnostics(canReturnNull = true).copy()

can throw NPE, that's why there is a to DRY the null checking logic

In order to be DRY for internal stuff you duplicate a single publicly visible method four times! That's not DRY! To be DRY you'd write a helper method somewhere (hidden enough from regular users of the IR) and used it in all the internal implementations to avoid NPE:

public static DiagnosticsStorage copyOrNull(DiagnosticsStorage sOrNull) {
  return sOrNull == null ? null : sOrNull.copy();
}

and then called it as DiagnosticsStorage.copyOrNull(ir.diagnostics(true)).

You call it d he calls it

I don't care about any of the proposed names. I just believe four methods related to diagnostics in IR is 75% too much. I believe single DiagnosticsStorage diagnotics(boolean canReturnNull) would be enough to satisfy all use-cases and keep conceptual surface (some may call it DRY) smaller.

@@ -42,115 +41,5 @@ class DiagnosticStorageTest extends CompilerTest {
diagnostics.toList should contain(mkDiagnostic("b"))
diagnostics.toList should contain(mkDiagnostic("c"))
}

"mapping across the diagnostics to produce a new sequence" in {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for removing all these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were testing the logic that was merely delegating to the underlying list of DiagnosticStorage. This logic is no longer used

@@ -151,23 +151,23 @@ trait CompilerRunner {
*/
def asMethod: definition.Method = {
new definition.Method.Explicit(
Name.MethodReference(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this change related to diagnostic storage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the compilation, the Explicit node is always created from the Binding copying the accompanying diagnostic, location, etc. The constructor was updated to DRY it.

*
* @return the copy of diagnostic storage
*/
DiagnosticStorage diagnosticsCopy();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes more sense if you put it into the context of LazyDiagnosticStorage, which deal with the possibility of null.
Although I would probably already provide a default implementation here:

DiagnosticStorage diagnosticCopy() {
  if (diagnostics == null) null else diagnostics.copy();
}

Also I would maybe rename it to diagnosticsLazyCopy

_diagnostics = diagnostics
}

override def diagnostics: DiagnosticStorage = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_diagnostics: DiagnosticStorage = _

represents the actual allocated DiagnosticStorage and _ means null in this context. You call it d he calls it _diagnostics which is more in line with Scala's way of defining custom setters/getters. I think this is fine.
Also you would have to canReturnNull all over the place and I don't think this is any better.

_diagnostics
}

override def diagnosticsList: List[Diagnostic] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I get it. You essentially have all those methods to deal with the possibility of null and not repeat the logic all over the place.

Copy link
Member

@JaroslavTulach JaroslavTulach Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe so. Thus in order to simplify SPI (e.g. implementations re-written by Dmitry in this PR), we have complicated Client API. I see it as a completely opposite approach than the right one - the usage of IR should be simple. The implementation needn't be - API authors have to live thru their pain.

@@ -356,31 +356,31 @@ class DataflowAnalysisTest extends CompilerTest {
val plusSymbol = mkDynamicDep("+")

// The Identifiers
val methodId = mkStaticDep(method.getId())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave those with brackets. getId may involve side-effects (as in, actually inferring the new id) and as such should require brackets.

Copy link
Member

@JaroslavTulach JaroslavTulach Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When talking about getId() - there used to be a PR that saved some memory by not allocating UUIDs. Have that PR need for different versions of getId(), @hubertp? If not, then it might be an example to follow when making DiagnosticStorage lazy.

JaroslavTulach
JaroslavTulach previously approved these changes Aug 23, 2024
Copy link
Member

@JaroslavTulach JaroslavTulach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like diagnosticsCopy and diagnosticsList are gone from IR and are only in LazyDiagnosticsStorage - more hidden, but still visible. I believe those methods shouldn't be visible at all. Moreover the initialization of _diagnostics field isn't thread safe. So I don't believe I can approve this change - but I have "requested changes"...

Approval just to _un-"request changes". Dmitry please find approval from someone else, don't count mine as +1.

_diagnostics
}

def diagnosticsList: List[Diagnostic] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make protected

if (_diagnostics eq null) Nil else _diagnostics.toList
}

def diagnosticsCopy: DiagnosticStorage = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it also be protected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is used in compiler passes.


private[this] var _diagnostics: DiagnosticStorage = _

protected def diagnostics_=(diagnostics: DiagnosticStorage): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who's calling setter (this is setter, right)? There was no setter before, so I don't think we need a method for it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a setter when creating an IR node and want to pass some existing diagnostic storage.


override def getDiagnostics: DiagnosticStorage = {
if (_diagnostics eq null) {
_diagnostics = new DiagnosticStorage()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not ready for multi threaded access. Are we OK with that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire compilation logic is not thread-safe. Compilation jobs ensure that the process is single-threaded

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we then using immutable IR and not modifying IR in place? Ó the wonders of like to be functional programming!

@JaroslavTulach JaroslavTulach self-requested a review August 23, 2024 14:25
@4e6
Copy link
Contributor Author

4e6 commented Aug 23, 2024

@JaroslavTulach GitHub allows reviewer to dismiss their review
2024-08-23-164035_1421x575_scrot

4e6 and others added 2 commits August 23, 2024 16:43
…java

Co-authored-by: Jaroslav Tulach <jaroslav.tulach@enso.org>
@JaroslavTulach JaroslavTulach dismissed their stale review August 23, 2024 17:01

I don't want to block the performance improvement

@4e6 4e6 added the CI: Ready to merge This PR is eligible for automatic merge label Aug 23, 2024
@mergify mergify bot merged commit 3bfe963 into develop Aug 23, 2024
42 checks passed
@mergify mergify bot deleted the wip/db/10727-diagnosticstorage-allocated branch August 23, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI: No changelog needed Do not require a changelog entry for this PR. CI: Ready to merge This PR is eligible for automatic merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DiagnosticStorage is being allocated eagerly for every IR
4 participants