Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/vcenter] Construct MetricsReceiver Component #9224

Merged

Conversation

schmikei
Copy link
Contributor

@schmikei schmikei commented Apr 12, 2022

Description: This spikes out the work in progress metrics receiver component of a vcenterreceiver component I drafted an issue for in #8377

This component uses govmomi to do the portions of metric collection currently using 3 separate APIs the PropertyCollector, the PerformanceManager, and the VSANPerformanceManager. This receiver is fairly complex since it has to read from so many APIs, so I am still contemplating levers for how much control of queries and enabling/disabling users can have. Happy to split up as neeeded.

My planned config structure is something that looks sort of like this:

receivers:
  vcenter:
      endpoint: https://<esxi-node or vcenter server>
      username: user@with-cluster-monitor permissions
      password: <password to authenticate>
      collection_interval: 40s
      tls:
          insecure: false
          insecure_skip_verify: true
exporters:
  file:
    path: "./content.json"

service:
  pipelines:
    metrics:
      receivers: [vcenter]
      exporters: [file]

Link to tracking Issue: #8377

Testing: Unit testing with the simulators found in govmomi which emulate a live ESXi and VPX environment.

Documentation: Documentation on how to use the metrics component of the receiver and auto-generated metrics via mdatagen

@schmikei
Copy link
Contributor Author

@djaglowski would you mind taking another look at this PR? I ended up including a bunch of recorded soap responses that should get us more coverage of the the earlier review!

@djaglowski
Copy link
Member

@schmikei, taking a look now, but please resolve the lint error:

Error: factory_test.go:68:29: SA1019: componenterror.ErrNilNextConsumer is deprecated: use component.ErrNilNextConsumer (staticcheck)
				require.ErrorIs(t, err, componenterror.ErrNilNextConsumer)
				                        ^
make[2]: *** [../../Makefile.Common:105: lint] Error 1
make[2]: Leaving directory '/home/runner/work/opentelemetry-collector-contrib/opentelemetry-collector-contrib/receiver/vcenterreceiver'

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking quite good. A couple minor things noted.

One important item - there's an empty file called cmd/otelcontribcol/content.json that should be removed.

Comment on lines 13 to 15
- 7.0
- 6.7
- 7.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort()

attributes: []
vcenter.host.disk.throughput:
enabled: true
description: The throughput to the host system's disk.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmikei PTAL

Comment on lines 293 to 294
v.mb.RecordVcenterClusterVMCountDataPoint(colTime, int64(len(vms))-int64(poweredOffVMs), metadata.AttributeVMCountPowerStateOn)
v.mb.RecordVcenterClusterVMCountDataPoint(colTime, int64(poweredOffVMs), metadata.AttributeVMCountPowerStateOff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about returning these counts from this function, and passing them into collectCluster? The way this is currently done is a bit too tricky because the next call to EmitForResource is in another function. It would be very easy to change up the order of functions without realizing that these metrics are then attached to the wrong resource.

@schmikei schmikei requested a review from djaglowski May 25, 2022 15:56
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM.

Will merge after v0.52.0 is released.

@schmikei, please make sure any dependencies on the core collector are updated to v0.52.0. See #10351

@schmikei
Copy link
Contributor Author

@djaglowski are things good now post release?

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @schmikei!

@djaglowski djaglowski merged commit 6bdc5e5 into open-telemetry:main May 26, 2022
kentquirk pushed a commit to McSick/opentelemetry-collector-contrib that referenced this pull request Jun 14, 2022
…y#9224)

* add vcenter vSAN collection

* checkpoint on getting property collection working

* checkpoint before integration test

* dual receivers under root receiver pointer

* checkpoint before updated mdatagen

* use syslog receiver rather than tcplogreceiver

* getting more performance counter refinements

* remove unneccessary component addition

* try to fix go.mod resolution issues

* try to fix go.mod resolution issues pt 2

* addlicense

* fix go.mod by fixing require directive

* add readme for metrics

* update readme

* fix go.mod referring nonexistent version

* add performance manager tests

* more tests

* add more attributes to virtual machines and host systems

* add more attributes to virtual machines and host systems

* spike changelog entry

* fix go.mod in both places

* fix go.mod in configschema

* add // import github.com/open-telemetry/opentelemetry-collector-contrib/receiver/vmwarevcenterreceiver to imports

* add quotations

* add to receiver lifecycle

* remove extra go generate direction

* fix typo of utilizaiton in metric description

* small changes to interval id in performance queries to be more consistent

* PR feedback including omitting company name prefix

* PR feedback to not fail starting the component on potential network failures

* minor grammar correction in vcenter readme

* update expected metrics

* update host_effective attribute value

* remove PerformanceInterval customizability

* add to codeowners

* fix indentation on merge conflict

* fix changelog entry place so its in the new components section

* update to be on 0.49.0 of the collector'

* add PR number to changelog

* regenerate with newer version of mdatagen

* move error log if unable to connect on start to receiver.Start() rather than scraper.Start()

* fix test cases from last commit

* minor update to config with tests

* fix metric description

* use utc for host vsan collection as well

* update comments of public facing methods

* return errors on getting clusters to the scraper errors

* PR feedback #1

* instantiate new client if client is nil

* update all descriptions to have punctuation

* three more descs

* move ensureReceiver up to once we validated as a config

* some more PR feedback

* looking into race conditions

* run go tidy

* fix import order and remove unneccessary mutex

* remove mutex from struct

* refactor client to responsible for knowing if the vsan endpoints are reachable

* fix integration test referencing old var

* change metrics.metrics => metrics.settings, update client pr feedback

* remove vSAN collection temporarily

* remove extra metric attributes for vSAN

* remove vsan specific variables

* clean up host PerfCounter disk latency metrics and fix some descriptions to better reflect interval

* add 20s interval to extended documentation as needed

* mdatagen fixes

* add integration test metric scrape

* fix import order

* go up to 0.49.1

* gotidy

* add replace directive for semconv

* gotidy fixes

* fix component not being on 0.50.0

* update to v0.50.1-0.20220429151328-041f39835df7

* use newer mdatagen

* remove any logging functionality change && update documentation

* fix integration test from flattening of config

* fix scraper start not erroring if connection cannot be established

* make scrapertest less flaky

* format test json

* Apply suggestions from code review

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>

* adjust metric definition for vcenter.host.disk.throughput

* remove comment and move pm level 2 metrics to appropriate section

* try to be respective of datacenters

* fix only vCenter server functionality

* try building out a mock server for test coverage

* make goporto

* fix build issues

* use latest mdatagen

* add newlines to ends of xml recordings

* fix integration test

* moved around scrapererrors because now the receiver is datacenter dependent

* try and do an audit of performance metrics and requests/responses

* update testdata with correct units

* make tidy

* make tidy

* update collector version

* fix local testing code including modules

* remove deprecated use of commonponenterror

* pr feedback; add method of collection recording, return poweredOn/poweredOff VMs

* remove content.json

* fix description change in scraper_test.go

* update collector version

* bump replaced module; rebuild load tests

* fix alibaba version auto localizing

Co-authored-by: Daniel Jaglowski <jaglows3@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants