Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent handling of non-ending trace events #10616

Closed
paulirish opened this issue Apr 21, 2020 · 5 comments
Closed

Inconsistent handling of non-ending trace events #10616

paulirish opened this issue Apr 21, 2020 · 5 comments
Assignees

Comments

@paulirish
Copy link
Member

paulirish commented Apr 21, 2020

It's very common (increasingly common, even?) to have a trace where some Begin (phase B) events do not have their partnered End (phase E) events. We'll call these "non-ending trace events".

Here's one such trace: Profile-20200420T122902.json.zip. Captured in Chrome DevTools UI. The "Rasterize Paint" aka RasterTask events are complete events (phase X), normally, except for when they don't end. Then, they are just B events.

Here's a more complex trace with plenty of the same behavior. trace_Mon_Apr_20_2020_3.09.45_PM.json.zip It was captured using the new perfetto pipeline.

That same trace rendered in both traceviewer and devtools:
image
The new perfetto viewer also doesn't elongate the pending events:
image
In fact, you can zoom in and see the green parent (which is non-ending) is visually shown as an instant event, even when it has a child that starts after and has a real duration:
image


Here's the current state of how our tools handle this situation:

  • Perfetto parsing: knows the "slice" isn't ended, but not much special handling. Doesn't set an end time, but they're thinking about it.
  • Perfetto UI: renders non-ended events like an instant event (with a 0 duration)
  • traceviewer parsing: sets the duration to be the maxTrackTimestamp, and keeps a didNotFinish flag on it. maxTrackTimestamp is the last known end time for that track (where "track" ~= thread)
  • traceviewer UI: adds "(Did not finish)" in the flamechart UI, but not in the details.
  • devtools parsing: sets all non-ended events to end at the end of the trace.
  • devtools UI: makes it look like REALLY long events, but yeah they're almost certainly not.
  • lighthouse/tracehouse parsing: we consider it unbounded and end the event at trace end. Update: i was wrong, see patrick's comment

I'd like to suggest a few actions:

  1. Trace capturing is broken somehow. Honestly feels like the last trace event (per thread) isn't being captured. (off by one error?) Report this to the relevant chrome/perfetto folks
  2. Lighthouse should adopt the same end/duration handling as traceviewer
  3. DevTools should also the same thing (though due to crbug.com/1030639, perhaps devtools will defer completely to perfetto's parsing)
  4. Perfetto/Traceviewer folks should pick which of their 2 ways they want to handle this.

ref GoogleChrome/lighthouse-ci#276 (comment)
ref #9491

@patrickhulce
Copy link
Collaborator

lighthouse/tracehouse parsing: we consider it unbounded and end the event at trace end.

Only kind of, sorry if I confused things with jokes :) Since #9785 we will always truncate the event at the end of the parent task and be very forgiving.

} else if (nextTask.unbounded) {
// It's ending at traceEndTs, it means we were missing the end event. We'll truncate it to the parent.
nextTask.endTime = currentTask.endTime;
nextTask.duration = nextTask.endTime - nextTask.startTime;

Because we only look at events inside toplevel tasks (which are "complete" X events that don't have this problem), trace end is effectively never used as the end of a task in practice in the context of lighthouse (but other folks using tracehouse for arbitrary traces might need to care).

Lighthouse should adopt the same end/duration handling as traceviewer
traceviewer parsing: sets the duration to be the maxTrackTimestamp, and keeps a didNotFinish flag on it. maxTrackTimestamp is the last known end time for that track (where "track" ~= thread)

This sounds less forgiving than where we stand right now. unbounded is effectively our didNotFinish flag. Seems like we might not need to take any action?

@paulirish
Copy link
Member Author

paulirish commented Apr 21, 2020

  1. Trace capturing is broken somehow. Honestly feels like the last trace event (per thread) isn't being captured. (off by one error?) Report this to the relevant chrome/perfetto folks

I'm working on a repro/bisect here. The most reproducible case I have is that on a CompositorTileWorker, the last "RasterTask" is often missing its END. Debugging right now, but it might be correlated to if the toplevel category is on or not.


Update! eseckler chimed in:

Regarding the last event missing, I almost certainly know what the cause of that is (especially on thread pool threads, like compositor tile workers)
We can't flush these threads on demand (no way to post a task to individual threads in a thread pool), and when we scrape their data on ending the trace, we can't be sure that the last event has been fully written into their tracing buffer (e.g. might be written in parallel). So we discard the last event - which is probably often an END.
But it's hard to fix, at least so far we weren't really considering fixing this

upstream bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1021571

@paulirish
Copy link
Member Author

paulirish commented Apr 28, 2020

This sounds less forgiving than where we stand right now. unbounded is effectively our didNotFinish flag. Seems like we might not need to take any action?

Oh fantastic. Okay so,

  1. Lighthouse is fine right now. No action needed
  2. DevTools must change their handling of these. @paulirish to file a bug.
  3. Ideally, tracing folks fix the core issue of lost events in thread pool threads

@paulirish
Copy link
Member Author

DevTools must change their handling of these. @paulirish to file a bug.

captured the full story for devtools in https://bugs.chromium.org/p/chromium/issues/detail?id=982252#c11

@paulirish
Copy link
Member Author

It seems like this may finally be fixed in Chromium. https://bugs.chromium.org/p/chromium/issues/detail?id=1021571#c25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants