-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data rereads #1529
Fix data rereads #1529
Conversation
windowPoints = previewSeconds / data.step | ||
deviation = TimeSeries(data.name, data.start + previewSeconds, data.end, data.step, data[windowPoints:]) | ||
deviation.pathExpression = data.pathExpression | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes Travis lint check fail:
graphite-web/webapp/graphite/render/functions.py:2331:1: W293 blank line contains whitespace
Current coverage is 49.40%@@ master #1529 diff @@
==========================================
Files 52 52
Lines 5790 5779 -11
Methods 0 0
Messages 0 0
Branches 1116 1111 -5
==========================================
+ Hits 2656 2855 +199
+ Misses 2927 2713 -214
- Partials 207 211 +4
|
@cbowman0 How well has this been working for you in production (against master)? |
I will attempt to get test coverage on these once I work out how to simulate the request object. If someone has any ideas, please let me know. |
@obfuscurity I'll let you know early next week. I deployed the changes to the primary cluster today. |
No complaints from my users. |
I discovered that movingAverage() on data that was already retrieved by a non-summary function like perSecond() was going back to retrieve earlier data, which it needs to do to get the first points in the graph correct. However in retrieving this data it was not using the original wildcard but polling each datapoint individually. On queries with over a thousand datapoints in a clustered environment this was incredibly resource intensive and slow. This change passes the original tokens to each function as an element 'args' in the resourceContext. So if a function needs to retrieve the original data with a time offset it can do so, and retrieve the data in the same way as it was originally retrieved with wildcards intact. This is effecient. Four functions used _fetchWithBootstrap, so I updated all of them to use the new mechanism. There are a bunch of other function that also re-pull data but do not use _fetchWithBootstrap. I believe they could also benefit from this, but I have not addressed them. I verified the output against the original function results and did find some discrepancies, but I believe these are bugs in the old code, so the results should now be better as well as faster. These discrepancies were: In movingMedian _fetchWithBootsstap sometimes inserted an incorrect Null before the time window, changing the result for the first few points holtWinters functions old version had a discontinuity around a 7 day time selection. 167 hours significantly different from 169 hours. This discontinuity is now gone, but the results for shorter time windows are now significantly changed. I'm not a statistician but I suspect these are intended for long time windows and the discontinuity was a bug brought on by mixing two whisper ranges together. Conflicts: .gitignore webapp/graphite/render/evaluator.py
I discovered that movingAverage() on data that was already retrieved by a non-summary function like perSecond() was going back to retrieve earlier data, which it needs to do to get the first points in the graph correct. However in retrieving this data it was not using the original wildcard but polling each datapoint individually. On queries with over a thousand datapoints in a clustered environment this was incredibly resource intensive and slow. This change passes the original tokens to each function as an element 'args' in the resourceContext. So if a function needs to retrieve the original data with a time offset it can do so, and retrieve the data in the same way as it was originally retrieved with wildcards intact. This is effecient. Four functions used _fetchWithBootstrap, so I updated all of them to use the new mechanism. There are a bunch of other function that also re-pull data but do not use _fetchWithBootstrap. I believe they could also benefit from this, but I have not addressed them. I verified the output against the original function results and did find some discrepancies, but I believe these are bugs in the old code, so the results should now be better as well as faster. These discrepancies were: In movingMedian _fetchWithBootsstap sometimes inserted an incorrect Null before the time window, changing the result for the first few points holtWinters functions old version had a discontinuity around a 7 day time selection. 167 hours significantly different from 169 hours. This discontinuity is now gone, but the results for shorter time windows are now significantly changed. I'm not a statistician but I suspect these are intended for long time windows and the discontinuity was a bug brought on by mixing two whisper ranges together. Conflicts: webapp/graphite/render/functions.py
Conflicts: .gitignore
* Remove extraneous whitespace on a blank line in graphite/render/functions.py * Add tests for movingMedian() * Add tests for movingAverage() * First cut at test coverage of holtWinters* functions.
Any objections to merging this? |
@cbowman0 I haven't had a time to review this yet. Sorry, been hands-full with $newjob and Monitorama. Will try to look closer while I'm traveling tomorrow. |
@cbowman0 Sorry for the delay. Been HAM on Monitorama lately. This looks sane and we already merged a version of this in 0.9.x, so there's no reason not to get this in master and let people hammer away at it. Thanks for this fix. |
Cherry-picked the commits to fix issue #1521 against the master branch.
My peers were experiencing this exact problem yesterday. We are testing this fix now.