Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data rereads #1529

Merged
merged 5 commits into from
Jul 1, 2016
Merged

Conversation

cbowman0
Copy link
Member

@cbowman0 cbowman0 commented Jun 1, 2016

Cherry-picked the commits to fix issue #1521 against the master branch.

My peers were experiencing this exact problem yesterday. We are testing this fix now.

windowPoints = previewSeconds / data.step
deviation = TimeSeries(data.name, data.start + previewSeconds, data.end, data.step, data[windowPoints:])
deviation.pathExpression = data.pathExpression

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes Travis lint check fail:

graphite-web/webapp/graphite/render/functions.py:2331:1: W293 blank line contains whitespace

@codecov-io
Copy link

codecov-io commented Jun 1, 2016

Current coverage is 49.40%

Merging #1529 into master will increase coverage by 3.53%

@@             master      #1529   diff @@
==========================================
  Files            52         52          
  Lines          5790       5779    -11   
  Methods           0          0          
  Messages          0          0          
  Branches       1116       1111     -5   
==========================================
+ Hits           2656       2855   +199   
+ Misses         2927       2713   -214   
- Partials        207        211     +4   

Powered by Codecov. Last updated by c44feea...b4fe3b2

@obfuscurity
Copy link
Member

@cbowman0 How well has this been working for you in production (against master)?

@cbowman0
Copy link
Member Author

cbowman0 commented Jun 9, 2016

I will attempt to get test coverage on these once I work out how to simulate the request object. If someone has any ideas, please let me know.

@cbowman0
Copy link
Member Author

cbowman0 commented Jun 9, 2016

@obfuscurity I'll let you know early next week. I deployed the changes to the primary cluster today.

@cbowman0
Copy link
Member Author

No complaints from my users.

I discovered that movingAverage() on data that was already retrieved by
a non-summary function like perSecond() was going back to retrieve
earlier data, which it needs to do to get the first points in the graph
correct. However in retrieving this data it was not using the original
wildcard but polling each datapoint individually. On queries with over a
thousand datapoints in a clustered environment this was incredibly
resource intensive and slow.

This change passes the original tokens to each function as an element
'args' in the resourceContext. So if a function needs to retrieve the
original data with a time offset it can do so, and retrieve the data in
the same way as it was originally retrieved with wildcards intact. This
is effecient.

Four functions used _fetchWithBootstrap, so I updated all of them to use
the new mechanism. There are a bunch of other function that also re-pull
data but do not use _fetchWithBootstrap. I believe they could also
benefit from this, but I have not addressed them.

I verified the output against the original function results and did find
some discrepancies, but I believe these are bugs in the old code, so the
results should now be better as well as faster. These discrepancies
were:

In movingMedian _fetchWithBootsstap sometimes inserted an incorrect Null
before the time window, changing the result for the first few points

holtWinters functions old version had a discontinuity around a 7 day
time selection. 167 hours significantly different from 169 hours. This
discontinuity is now gone, but the results for shorter time windows are
now significantly changed. I'm not a statistician but I suspect these
are intended for long time windows and the discontinuity was a bug
brought on by mixing two whisper ranges together.

Conflicts:
	.gitignore
	webapp/graphite/render/evaluator.py
I discovered that movingAverage() on data that was already retrieved by
a non-summary function like perSecond() was going back to retrieve
earlier data, which it needs to do to get the first points in the graph
correct. However in retrieving this data it was not using the original
wildcard but polling each datapoint individually. On queries with over a
thousand datapoints in a clustered environment this was incredibly
resource intensive and slow.

This change passes the original tokens to each function as an element
'args' in the resourceContext. So if a function needs to retrieve the
original data with a time offset it can do so, and retrieve the data in
the same way as it was originally retrieved with wildcards intact. This
is effecient.

Four functions used _fetchWithBootstrap, so I updated all of them to use
the new mechanism. There are a bunch of other function that also re-pull
data but do not use _fetchWithBootstrap. I believe they could also
benefit from this, but I have not addressed them.

I verified the output against the original function results and did find
some discrepancies, but I believe these are bugs in the old code, so the
results should now be better as well as faster. These discrepancies
were:

In movingMedian _fetchWithBootsstap sometimes inserted an incorrect Null
before the time window, changing the result for the first few points

holtWinters functions old version had a discontinuity around a 7 day
time selection. 167 hours significantly different from 169 hours. This
discontinuity is now gone, but the results for shorter time windows are
now significantly changed. I'm not a statistician but I suspect these
are intended for long time windows and the discontinuity was a bug
brought on by mixing two whisper ranges together.

Conflicts:
	webapp/graphite/render/functions.py
Conflicts:
	.gitignore
* Remove extraneous whitespace on a blank line in graphite/render/functions.py
* Add tests for movingMedian()
* Add tests for movingAverage()
* First cut at test coverage of holtWinters* functions.
@cbowman0
Copy link
Member Author

Any objections to merging this?

@obfuscurity
Copy link
Member

@cbowman0 I haven't had a time to review this yet. Sorry, been hands-full with $newjob and Monitorama. Will try to look closer while I'm traveling tomorrow.

@obfuscurity obfuscurity merged commit 69f8814 into graphite-project:master Jul 1, 2016
@obfuscurity
Copy link
Member

@cbowman0 Sorry for the delay. Been HAM on Monitorama lately. This looks sane and we already merged a version of this in 0.9.x, so there's no reason not to get this in master and let people hammer away at it. Thanks for this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants