Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Web Metrics reports hourly and every minute #16174

Merged
merged 1 commit into from
Oct 30, 2017

Conversation

xeviknal
Copy link
Member

@xeviknal xeviknal commented Oct 11, 2017

This PR add two reports asked in JMAN4-120 / HAWKULAR-1162.

Example for minute-ranged report:
reports-1-3

Example for hour-ranged report:
reports-1-4

Reports listed on menu:
reports-2

The reports in this PR are based on metrics which are not enabled by default. In order to do so, you need to open a EAP console and run the following command:

/subsystem=undertow/:write-attribute(name=statistics-enabled,value=true)

or for EAP6:

/subsystem=web/:write-attribute(name=statistics-enabled,value=true)

reports-instruction-1

This way, the metrics for each deployment are going to be on counting:
reports-instruction-2

Since this report is tracking metrics related to sessions, it is necessary to deploy apps with session management. Here a couple of demo apps:

https://drive.google.com/drive/u/0/folders/0B8K8kT5CYn9MYUZfLWVDdTllcWM

In order to get numbers on sessions metrics, it is only necessary to browse the home page of any of those demo apps. To make it simple, here a JMeter test plan:

https://drive.google.com/a/redhat.com/file/d/0BxCIJ_AbAJBeM2tCOV9jSTJidkk/view?usp=sharing

@miq-bot
Copy link
Member

miq-bot commented Oct 13, 2017

This pull request is not mergeable. Please rebase and repush.

@chessbyte
Copy link
Member

@abonas will review after your 👍

@chessbyte chessbyte requested a review from abonas October 13, 2017 13:47
---
where_clause:
dims:
created_on: 2016-10-09 14:13:30.822035 Z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created_on and updated_on fields are set on 2016. are those necessary in the first place?
please check. if yes - they should be at least with an updated date :)

dims:
created_on: 2016-10-09 14:13:30.822035 Z
reserved:
title: "Web Metrics - hourly over the last day"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the title specific enough? it is not clear solely from it that it's regarding web metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have the big picture of whole system yet. Maybe I would add here that this metrics are from EAP: EAP Web Metrics - XXX maybe? Any tip?

dims:
created_on: 2016-10-09 14:13:30.822035 Z
reserved:
title: "Web Metrics - every minute over the last hour"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the title specific enough? it is not clear solely from it that it's regarding web metrics

reserved:
title: "Web Metrics - every minute over the last hour"
conditions:
updated_on: 2016-10-09 17:23:33.732482 Z
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2016? please see my other comment about the dates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not necessary. So I remove them in both reports

@abonas
Copy link
Member

abonas commented Oct 15, 2017

please attach a screenshot how this looks in UI.
also, are there instructions of how to enable those metrics in eap? (for qe testing).
afaik they are not enabled out of the box.

@xeviknal
Copy link
Member Author

Here I got pretty confused. Talking to mtho11, we realized that metrics are already enabled: Talking to mtho11, we found out that metrics are already enabled here.
Now, I just realized, that they are enabled from Hawkular agent, but not from the actual EAP server. Digging a bit more on that.

@xeviknal xeviknal force-pushed the HAWKULAR-1162 branch 2 times, most recently from 2365514 to 5e31a5f Compare October 25, 2017 12:42
@abonas
Copy link
Member

abonas commented Oct 25, 2017

@aljesusg please review

@abonas
Copy link
Member

abonas commented Oct 26, 2017

@israel-hdez please review

Copy link
Member

@aljesusg aljesusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see mw_agregated_servlet_time_avg this column with one 'g' in both files. Is a problem in the name of the column @xeviknal?

- mw_aggregated_max_active_web_sessions_max
- mw_aggregated_expired_web_sessions_max
- mw_aggregated_rejected_web_sessions_max
- mw_agregated_servlet_time_avg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mw_aggregated_servlet_time_avg , there is only one 'g'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to add that there is a bug here. In the hawkular agent, is it written with one 'g'.
Check it out here: ManageIQ/manageiq/product/live_metrics/middleware_server.yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonas how should I proceed to open a discussion on bug?

- mw_aggregated_max_active_web_sessions_max
- mw_aggregated_expired_web_sessions_max
- mw_aggregated_rejected_web_sessions_max
- mw_agregated_servlet_time_avg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here only one 'g'

Copy link
Member

@aljesusg aljesusg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGFM

- name
- start_time
- mw_aggregated_active_web_sessions_max
- mw_aggregated_max_active_web_sessions_max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at your screenshot and googling a little, max_active_web_sessions probably is not a metric, but a configuration attribute. In your screenshot it's always -1, probably meaning that the number of active sessions is not limited. May be @jmazzitelli can answer better.

If it's a config value, it shouldn't be in the report nor in the agent config.

Copy link
Member Author

@xeviknal xeviknal Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the hawkular agent configurations, that is an actual metric. Check the config files for both EAP6 and EAP7.

Also, looking into the aggregated values within the Wildfly client, you can see as well that this is a read-only attribute as the other metrics in the subsystem. reports-metric.

As you said, that -1 is because the max active session setup was set to unlimited. Once the config is changed, the report prints out same value as config.
reports-metric-2

In my opinion, I would leave this metric since it could be useful for an admin to see max-active-sessions as a threshold for the active sessions. Also, it could be useful to explain an increase/decrease of active sessions due to a change of the configuration attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! If it's going to stay, the label should reflect that it's a config value. Otherwise, user will get confused with two consecutive columns whose label is very, very similar.

Copy link
Member Author

@xeviknal xeviknal Oct 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@israel-hdez just saw more info from John Doyle about the required columns for that report. here the ones that must be in:

  • Active Web Sessions
  • Expired Web Sessions
  • Rejected Web Sessions

I think it is clearer now.
Regarding to the hourly report, it makes sense to add min/max /avg columns. In the minute-ranged one, it doesn't. As you said before, previous metrics are collected every minute thus metric values would be exactly the same in avg, min and max.

- Aggregated Max Active Web Sessions
- Aggregated Expired Web Sessions
- Aggregated Rejected Web Sessions
- Aggregated Servlet Request Time
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, Request Time should be labeled as Average while the others as Maximum values?
Also, the word Aggregated doesn't make sense to me in the columns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to remove Aggregated word to label.
+1 to add Average to Servlet Request Time
For the other labels, I do have doubts about adding a Maximum label. Those values are counters of sessions (created, active, rejected, so on) so maximum doesn't add meaning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, it's worth only for the Active Web Session column since data doesn't look like a proper counter. May be this one is the observed highest number of active web sessions? If so, it's like a gauge rather than a counter. (If it were a counter, the label should instead be Handled Web Sessions).

For all others columns (and excluding Servlet Request Time) ... I cannot understand why in your updated image there are fractional numbers. These counters shouldn't have fractional numbers. This alone may make the user unsure about what kind of numbers is the report outputting.

where_clause:
dims:
reserved:
title: "EAP Web Metrics - every minute over the last hour"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is a detailed report. The agent is, by default, configured to make measurements each minute or more (depends on the metric). Unless the end user changes the agent config, this report won't be doing any aggregation, but showing every single datapoint (or, probably, repeating the same measurement on more than one row).

The Request Count is collected once every 5 minutes, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this is not an aggregation report even though it gives meaningful information to the admin, from my point of view.

Regarding the Request Count metrics, it is taken from here and is collected every 1 minute.

- mw_aggregated_expired_web_sessions_max
- mw_aggregated_rejected_web_sessions_max
- mw_aggregated_servlet_time_avg
- mw_aggregated_servlet_request_count_max
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is always zero in your screenshot. Probably more time was needed to let the agent make measurements.

Could you confirm that this isn't an always increasing number? If so, you will need to do some extra processing to show the true hourly request count rather than an accumulated count.

Copy link
Member Author

@xeviknal xeviknal Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that mw_aggregated_servlet_time and mw_aggregated_servlet_request_count are zeros because the tracked app doesn't have any servlet defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding to the request count, it is not an absolute count. It only counts the request per hour. Not adding up the current count to the previous hour count.
Is that what you mean?

Copy link
Member

@israel-hdez israel-hdez Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my reasoning was that counters... well... they usually never reset to zero to let you query them and see the total count at the "present" time. So, I was expecting the request count to be an always increasing number. But from your updated image, I see that's not the case.

Probably, the server (or the agent?) forces a reset each time the measurement is done?

@xeviknal
Copy link
Member Author

xeviknal commented Oct 27, 2017

Having a problem now with 2 empty rows at the beginning. Does anybody know about what it could be?

reports-1 2

[Update] I see this is happening on all every-minute reports on MIQ.

@miq-bot
Copy link
Member

miq-bot commented Oct 30, 2017

Checked commit xeviknal@015581f with ruby 2.3.3, rubocop 0.47.1, and haml-lint 0.20.0
1 file checked, 0 offenses detected
Everything looks fine. 🏆

@abonas
Copy link
Member

abonas commented Oct 30, 2017

@miq-bot assign @agrare

@agrare agrare merged commit fa32954 into ManageIQ:master Oct 30, 2017
@agrare agrare added this to the Sprint 72 Ending Oct 30, 2017 milestone Oct 30, 2017
@xeviknal
Copy link
Member Author

Good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants