misc(scripts): add lantern evaluation scripts #5257

patrickhulce · 2018-05-18T00:45:12Z

phase 2 of run lantern accuracy checks per commit

related #5237

this adds a set of scripts that downloads a set of 100 minified traces, runs lantern over them and prints out some summary statistics.

the trace set is mostly representative, skewing slightly negative so there's more identifiable room for improvement, also the MAPE/spearman's rho stats on this set are slightly lower than you might have seen in previous PRs because of this and that this is 1 trace compared to median of WPT instead of median of 9 traces compared to WPT

Good/OK/Bad thresholds very open to discussion, right now...

Good
<20% absolute error OR <10% percentile difference (predicting 60s when real is 80s is good IMO).
Sites here are roughly indistinguishable from WPT results on single-runs as the error is within the normal variance level of WPT
OK
<50% absolute error, sites here are roughly as inaccurate as DevTools throttling on its edge cases
Bad
>50% absolute error, sites here are usually inaccurate and should be dug into, mostly fall into a few categories
1. there's a large difference between optimistic/pessimistic and and estimate ends up somewhere in the middle
2. Page is much slower on a real phone than when throttled (in some cases never reaching TTI)
3. The underlying runs were very different (should probably exclude these from dataset eventually)
4. Other unclear reason (prime investigation targets!)

see sample output

----    Metric Stats    ----
metric                         estimate                  rank error   MAPE       Good/OK/Bad
firstContentfulPaint           optimisticFCP             15.6%        32.4%      62/21/16
firstContentfulPaint           pessimisticFCP            15.6%        29.9%      57/29/13
firstContentfulPaint           roughEstimateOfFCP        15.4%        26.9%      59/26/14
firstMeaningfulPaint           optimisticFMP             16.9%        36.1%      61/19/19
firstMeaningfulPaint           pessimisticFMP            16.1%        32.1%      57/29/13
firstMeaningfulPaint           roughEstimateOfFMP        16.4%        28.1%      61/20/18
timeToFirstInteractive         optimisticTTFCPUI         11.7%        33.7%      62/21/16
timeToFirstInteractive         pessimisticTTFCPUI        14.2%        36.3%      60/15/24
timeToFirstInteractive         roughEstimateOfTTFCPUI    11.7%        33.3%      67/19/13
timeToConsistentlyInteractive  optimisticTTI             10.8%        36.1%      71/5/22
timeToConsistentlyInteractive  pessimisticTTI            11.4%        38.9%      61/18/19
timeToConsistentlyInteractive  roughEstimateOfTTI        11%          34.4%      62/21/15
speedIndex                     optimisticSI              19.8%        60%        45/22/32
speedIndex                     pessimisticSI             27.1%        54.8%      37/19/43
speedIndex                     roughEstimateOfSI         20.7%        39.6%      45/28/26

----    Summary Stats    ----
Good: 60%
OK: 23%
Bad: 17%

----    Worst10 Sites    ----
underestimated firstMeaningfulPaint by 23199 on http://www.thefreedictionary.com/
underestimated firstContentfulPaint by 23444 on http://www.thefreedictionary.com/
underestimated speedIndex by 22445 on http://www.thefreedictionary.com/
underestimated firstMeaningfulPaint by 14593 on http://www.rakuten.ne.jp/
overestimated speedIndex by 7867 on http://www.foxnews.com/
underestimated speedIndex by 23633 on http://www.huffingtonpost.com/
overestimated firstContentfulPaint by 3129 on http://www.7k7k.com/
underestimated speedIndex by 24705 on http://www.cnet.com/
underestimated speedIndex by 7606 on http://www.metacafe.com/
overestimated timeToFirstInteractive by 10463 on http://www.hatena.ne.jp/

brendankenny

Evaluating firstContentfulPaint vs. optimisticFCP: 15.6% 32.4% - 62/21/16
Evaluating firstContentfulPaint vs. pessimisticFCP: 15.6% 29.9% - 57/29/13

what do these numbers refer to? Maybe add a header or something to the print out?

patrickhulce · 2018-05-18T22:29:56Z

what do these numbers refer to? Maybe add a header or something to the print out?

headers added 👍

paulirish · 2018-05-19T01:57:49Z

lighthouse-core/scripts/lantern/download-traces.sh

@@ -0,0 +1,10 @@
+#!/bin/bash
+
+# THIS SCRIPT ASSUMES CWD IS ROOT PROJECT


if you want,

pwd="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" lhroot_path="$pwd/../.."

patrickhulce · 2018-05-23T21:03:13Z

anything else here folks?

paulirish · 2018-05-23T21:20:21Z

lighthouse-core/scripts/lantern/download-traces.sh

+LH_ROOT_PATH="$DIRNAME/../../.."
+cd $LH_ROOT_PATH
+
+TAR_URL="https://drive.google.com/a/chromium.org/uc?id=1_w2g6fQVLgHI62FApsyUDejZyHNXMLm0&amp;export=download"


plz add comment on what this is and if it's ever updated frozen snapshots from XXX date

brendankenny

I'm good with this too.

I mentioned this (jokingly) in another PR, but residual distribution is really one of the key signals we should be looking at for evaluating the regression and (in turn) the generation of the pessimistic/optimistic signals and if we're not incorporating influential variables that we should be.

We would need a random ("random") sample for that, though, and the scripts themselves are good regardless of data run through them, so I'm 👍 👍

brendankenny · 2018-05-23T21:43:50Z

lighthouse-core/scripts/lantern/evaluate-results.js

+const totalBad = [];
+
+/**
+ * @param {string} metric


keyof whatever might let you skip some of the ts-ignores below

didn't remove ts-ignores but helps elsewhere 👍

brendankenny · 2018-05-23T21:50:39Z

lighthouse-core/scripts/lantern/evaluate-results.js

+ * @property {string} url
+ * @property {string} tracePath
+ * @property {string} devtoolsLogPath
+ * @property {*} lantern


whats the def for this? is it just that it's long?

misc(scripts): add lantern evaluation scripts

4d05ee1

patrickhulce requested review from brendankenny and paulirish as code owners May 18, 2018 00:45

paulirish deployed to PR staging May 18, 2018 00:45 View deployment

brendankenny reviewed May 18, 2018

View reviewed changes

add headers

611fe85

paulirish deployed to PR staging May 18, 2018 22:29 View deployment

paulirish reviewed May 19, 2018

View reviewed changes

patrickhulce added 2 commits May 21, 2018 15:33

more flexible download script

070cadd

Merge branch 'master' into run_lantern

d2d5b86

paulirish approved these changes May 23, 2018

View reviewed changes

add comment to trace tar link

93680a0

brendankenny approved these changes May 23, 2018

View reviewed changes

typedefs

89d9251

patrickhulce force-pushed the run_lantern branch from 64a84fd to 89d9251 Compare May 23, 2018 22:32

patrickhulce merged commit e523e2d into master May 23, 2018

patrickhulce deleted the run_lantern branch May 23, 2018 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

misc(scripts): add lantern evaluation scripts #5257

misc(scripts): add lantern evaluation scripts #5257

patrickhulce commented May 18, 2018 •

edited

Loading

brendankenny left a comment

patrickhulce commented May 18, 2018

paulirish May 19, 2018

patrickhulce commented May 23, 2018

paulirish May 23, 2018

brendankenny left a comment

brendankenny May 23, 2018

patrickhulce May 23, 2018

brendankenny May 23, 2018

patrickhulce May 23, 2018

		@@ -0,0 +1,10 @@
		#!/bin/bash

		# THIS SCRIPT ASSUMES CWD IS ROOT PROJECT

misc(scripts): add lantern evaluation scripts #5257

misc(scripts): add lantern evaluation scripts #5257

Conversation

patrickhulce commented May 18, 2018 • edited Loading

phase 2 of run lantern accuracy checks per commit

brendankenny left a comment

Choose a reason for hiding this comment

patrickhulce commented May 18, 2018

paulirish May 19, 2018

Choose a reason for hiding this comment

patrickhulce commented May 23, 2018

paulirish May 23, 2018

Choose a reason for hiding this comment

brendankenny left a comment

Choose a reason for hiding this comment

brendankenny May 23, 2018

Choose a reason for hiding this comment

patrickhulce May 23, 2018

Choose a reason for hiding this comment

brendankenny May 23, 2018

Choose a reason for hiding this comment

patrickhulce May 23, 2018

Choose a reason for hiding this comment

patrickhulce commented May 18, 2018 •

edited

Loading