Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New component: vcenterreceiver #8377

Closed
schmikei opened this issue Mar 10, 2022 · 12 comments
Closed

New component: vcenterreceiver #8377

schmikei opened this issue Mar 10, 2022 · 12 comments
Labels
Accepted Component New component has been sponsored

Comments

@schmikei
Copy link
Contributor

schmikei commented Mar 10, 2022

The purpose and use-cases of the new component

I'd like to contribute a vcenter server receiver that monitors the different levels of resources and logs. This receiver should be capable of monitoring ESXi hosts, virtual machines, cluster, resource pool, and datacenter levels of metrics. Along with this receiver it should be able to support both SAN and vSAN storage metrics.

Along with this, I'd like to use the opentelemetry-log-collection to help process raw tcp input and then parse them as syslog messages after filtering inproperly formatted entries forwarded from the vCenter server and ESXi hosts.

Collection Methods

Supported Versions of vCenter (derived from govmomi)

  • 6.5
  • 6.7
  • 7.0

Example configuration for the component

  vcenter:
    endpoint: <scheme(http|https)>://vcsa.somehost.localnet # address of the vCenter server or SDK enabled ESXi host
    username: username@somehost.localnet # username of a minimally provided Read only user
    password: password # password of the user
    collection_interval: 1m

    syslog_listen_addr: 0.0.0.0:5140 # port to listen on for syslog messages
    max_log_size: 1MB # max log size to be ingested over syslog
    encoding: utf-8 # encoding
    attributes: {} # attributes to attach to entries
    resource: {} # resource info to attach to entries
    tls: # configtls settings
      insecure: false
      insecure_skip_verify: true

Telemetry data types supported

  • metrics
  • logs

Sponsor (Optional)

  • Needs a sponsor
@mx-psi mx-psi added the Sponsor Needed New component seeking sponsor label Mar 10, 2022
@djaglowski
Copy link
Member

Is there anything special about the syslogs e.g. additional content that can be parsed because we know it is coming from vmware? If not, I don't think this receiver needs to support logs. (syslogreceiver can be used directly)

I'm curious to hear more about that, but if this component were to support logs, I think that instead of using opentelemetry-log-collection, it would be more appropriate to wrap syslogreceiver. (See conversation over here about wrapping components.) Ultimately, this would use the log collection library, but I don't think we should have any more components than necessary interacting with it directly.

In any case, it's not clear to me how the proposed config would support syslog anyways. Is there a missing endpoint? This maybe brings up interesting questions about whether metrics and logs parameters should be mixed together or split into sub-sections of the config.

@schmikei
Copy link
Contributor Author

schmikei commented Mar 10, 2022

Is there anything special about the syslogs e.g. additional content that can be parsed because we know it is coming from vmware? If not, I don't think this receiver needs to support logs. (syslogreceiver can be used directly)

I'm curious to hear more about that, but if this component were to support logs, I think that instead of using opentelemetry-log-collection, it would be more appropriate to wrap syslogreceiver. (See conversation over here about wrapping components.) Ultimately, this would use the log collection library, but I don't think we should have any more components than necessary interacting with it directly.

As far as what we've seen in the past for vCenter forwarded logs, not all messages generally inputted as raw tcp input i.e

23 <134>1 2022-03-01T19:34:33.160936+00:00 vcsa vapi-endpoint-access - - - 2022-03-01T19:34:33.160Z | jetty-default-29          | 127.0.0.1 - - [01/Mar/2022:19:34:29 +0000] "GET /rest/com/vmware/vapi/metadata/metamodel/component?~method=OPTIONS HTTP/1.1" 200 2534 "-" "python-requests/2.19.1" 3558

note the leading extra ID (23) prefixed with it which breaks the rfc5424 format. I can look more into when these show up to see if it is something else configurable with vCenter logs. Definitely something that may need some

As far as what actually will be used, yes I think a more accurate proposal of what I want to do is to wrap the tcplogreceiver and then use the operators to do the syslog parsing once input has been partially sanitized. @djaglowski does that seem like the approach you would take here as well?

In any case, it's not clear to me how the proposed config would support syslog anyways. Is there a missing endpoint? This maybe brings up interesting questions about whether metrics and logs parameters should be mixed together or split into sub-sections of the config.

For this one I am also somewhat curious, as far as I can tell there haven't been too many components that do this shared config across logging and metrics and it makes me ponder config conflicts such as tls.

receivers:
  vcenter:
    ### Metrics Config ###
    endpoint: <scheme(http|https)>://vcsa.somehost.localnet
    username: username@somehost.localnet
    password: password 
    collection_interval: 1m
    ### Logging Config ###
    syslog_listen_addr: 0.0.0.0:5140
    max_log_size: 1MB
    encoding: utf-8
    attributes: {}
    resource: {}
    ### Shared(? or perhaps a logging_tls) ###
    tls:
      insecure: false
      insecure_skip_verify: true

exporters:
  logging:
    loglevel: debug

pipelines:
  metrics:
    receivers: [vcenter]
    exporters: [logging]
  logging:
    receivers: [vcenter]
    exporters: [logging]

I can imagine something like this but I can imagine it gets a tad hairy when config for the receiver is combined. Do you have any thoughts on what is the correct way of solving this issue where its not always just raw syslog and may need some custom processing?

@djaglowski
Copy link
Member

As far as what actually will be used, yes I think a more accurate proposal of what I want to do is to wrap the tcplogreceiver and then use the operators to do the syslog parsing once input has been partially sanitized. @djaglowski does that seem like the approach you would take here as well?

I see. You want to receive the logs and, possibly clean them up a bit, and then feed them through the syslog parser. Roughly the same consideration would apply. I think you should use the tcplog receiver, rather than using the log-collection library's tcp_input directly.

Does vcenter necessarily send logs via tcp though? Could it be udp?

I think the mixed config is messy and should be split up. Roughly:

vcenter:
  metrics:
    endpoint: <scheme(http|https)>://vcsa.somehost.localnet
    username: username@somehost.localnet
    password: password 
    collection_interval: 1m
    tls:
      insecure: false
      insecure_skip_verify: true
  logs:
    listen_addr: 0.0.0.0:5140
    max_log_size: 1MB
    encoding: utf-8
    tls:
      insecure: false
      insecure_skip_verify: true

You can then easily determine whether or not a metrics or logs receiver should be created.

One issue with this is that it collides with a well established pattern in scrapers, where metrics refers to an embedded metadata.MetricSettings struct, which allows the user to turn on/off individual metrics. Probably we need to target something like:

vcenter:
  metrics:
    endpoint: <scheme(http|https)>://vcsa.somehost.localnet
    username: username@somehost.localnet
    password: password 
    collection_interval: 1m
    tls:
      insecure: false
      insecure_skip_verify: true
    settings:
      ...
  logs:
    listen_addr: 0.0.0.0:5140
    max_log_size: 1MB
    encoding: utf-8
    tls:
      insecure: false
      insecure_skip_verify: true

@djaglowski
Copy link
Member

How will integration testing work? It looks like govmomi provides a simulated API. Can we run that in a container and run meaningful tests against it?

@schmikei
Copy link
Contributor Author

schmikei commented Apr 7, 2022

How will integration testing work? It looks like govmomi provides a simulated API. Can we run that in a container and run meaningful tests against it?

Sorry for the delay in response time but was researching into it more and It looks like the simulator package that you linked works great for doing at least property collection.

Example client invocations:

package vmwarevcenterreceiver

import (
	"context"
	"testing"
	"time"

	"github.com/stretchr/testify/require"
	"github.com/vmware/govmomi/find"
	"github.com/vmware/govmomi/performance"
	"github.com/vmware/govmomi/simulator"
	"github.com/vmware/govmomi/vim25"
	"github.com/vmware/govmomi/vim25/mo"
	"github.com/vmware/govmomi/vim25/types"
)

func TestSimulatorCluster(t *testing.T) {
	simulator.Test(func(ctx context.Context, c *vim25.Client) {
		finder := find.NewFinder(c)
		pm := performance.NewManager(c)
		// vsManager, err := vsan.NewClient(ctx, c)
		// require.NoError(t, err)

		clusters, err := finder.ClusterComputeResourceList(ctx, "*")
		require.NoError(t, err)
		require.NotEmpty(t, clusters, 0)

		for _, c := range clusters {
			// cluster properties
			var objC mo.ClusterComputeResource
			c.Properties(ctx, c.Reference(), []string{"summary"}, &objC)
			totalMem := objC.Summary.GetComputeResourceSummary().TotalMemory
			require.Greater(t, totalMem, int64(0))

			// host collection
			hosts, err := c.Hosts(ctx)
			require.NoError(t, err)
			require.NotEmpty(t, hosts)

			// vsan Collection appeared to not work out of the box
			// startTime := time.Now().Add(-10 * time.Minute)
			// endTime := time.Now().Add(-1 * time.Minute)
			// querySpec := []vsanTypes.VsanPerfQuerySpec{
			// 	{
			// 		EntityRefId: "host-domclient:*",
			// 		StartTime:   &startTime,
			// 		EndTime:     &endTime,
			// 	},
			// }
			// cRef := c.Reference()
			// results, err := vsManager.VsanPerfQueryPerf(ctx, &cRef, querySpec)
			// require.NoError(t, err)
			// require.NotEmpty(t, results)

			// datastores
			dss, err := c.Datastores(ctx)
			require.NoError(t, err)
			for _, ds := range dss {
				var moDS mo.Datastore
				ds.Properties(ctx, ds.Reference(), []string{"summary"}, &moDS)
				capacity := moDS.Summary.Capacity
				require.Greater(t, capacity, int64(0))

				qs := []types.PerfQuerySpec{}
				startTime := time.Now().Add(-10 * time.Minute)
				endTime := time.Now().Add(-1 * time.Minute)
				qs = append(qs, types.PerfQuerySpec{
					Entity:    ds.Reference(),
					StartTime: &startTime,
					EndTime:   &endTime,
					MetricId:  []types.PerfMetricId{},
				})
				results, err := pm.Query(ctx, qs)
				require.NoError(t, err)
				require.NotEmpty(t, results)
			}
		}
	})
}

Feel free to run this code but the only thing I think of proposing that cannot be tested using the simulator package is the vSAN metric collection (I will look more into it, but we at least have a good base to do simulated testing). Otherwise I can start looking into doing more dockerized solution.

@djaglowski
Copy link
Member

works great for doing at least property collection

I think you are talking about static configuration values such as total memory, right? What about dynamic metric values like used memory?

@schmikei
Copy link
Contributor Author

schmikei commented Apr 7, 2022

Properties are not necessarily always static they get periodically updated. Here is a link for example the Virtual Machine Documentation that you retrieve via the Property Collector which in the cases I am thinking of are updated in near real time of when you retrieve the property.

However that is only one source of potential metrics as there is the Performance Manager API. As well as other APIs for more real time performance data like the vSAN performance manager API.

On a side I was able to find the vSAN equivalent simulator. And should be able to use that to write more tests.

@djaglowski
Copy link
Member

Properties are not necessarily always static they get periodically updated. Here is a link for example the Virtual Machine Documentation that you retrieve via the Property Collector which in the cases I am thinking of are updated in near real time of when you retrieve the property.

Thanks for clarifying @schmikei.

However that is only one source of potential metrics as there is the Performance Manager API. As well as other APIs for more real time performance data like the vSAN performance manager API.

Do these APIs share an endpoint & authentication?

On a side I was able to find the vSAN equivalent simulator. And should be able to use that to write more tests.

That's great. Seems it is sufficiently testable then.

@schmikei
Copy link
Contributor Author

schmikei commented Apr 7, 2022

Do these APIs share an endpoint & authentication?

Most of it is handled via authenticated user session authentication in govmomi which gives a central authenticated client to go off of to make the API calls.

As far as endpoints are concerned there are multiple routes that govmomi will reach out to when making the underlying calls but communication is only going to one single ESXi host or vCenter running the appropriate management SDK.

@djaglowski
Copy link
Member

Ok, sounds like the metric receiver can operator on one endpoint and one set of credentials then.

@djaglowski
Copy link
Member

This looks like it will be a somewhat complicated receiver, but it seems fully testable. VMware support in the collector would be a great addition. I'm happy to sponsor this.

Will you submit the metrics and logs parts separately?

@djaglowski djaglowski added Accepted Component New component has been sponsored and removed Sponsor Needed New component seeking sponsor labels Apr 7, 2022
@djaglowski
Copy link
Member

Closed by #9224

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Component New component has been sponsored
Projects
None yet
Development

No branches or pull requests

3 participants