Skip to content

Fix Route Registry Metrics to Report Correct Result #468

Closed
@hoffmaen

Description

@hoffmaen

Proposed Change

We see an issue with the Route Registry code.

Register

Currently, Register() calls register(), which calls Pool.Put()

Put() updates the pool if required, and returns the PoolPutResult to register():

  • UPDATED, whenever the endpoint has previously been registered in the pool (even if unmodified)
    • Exception: UNMODIFIED, if the message is a duplicate (based on ModificationTag)
  • ADDED, if the endpoint doesn't exist in the pool yet

register() creates the pool if required, sets the time of update and forwards the PoolPutResult to Register(). Register() receives and logs the PoolPutResult, and ingests a generic route registry metric.

Unregister

On the other hand, we have a similar functionality with Unregister(), which calls unregister(), which calls pool.Remove().

However, the structure is quite different here:
Remove() returns a boolean value reflecting whether the route has been removed from the pool, or not. unregister() however does some logging based on the result of Remove(), and removes the pool if required. Unregister() finally unconditionally ingests a generic route unregistry metric.

Issues

There are some problems with the current code:

  • We have a very generic metric for registering and unregistering endpoints - it does not reveal the result of the (un)registration (endpoint added, modified, unchanged, etc). However, tests have shown that the cost for each case differs, and that it is beneficial for operations to have more detailed metrics.
  • pool.Put() always returns status UPDATED when the route exists in the pool. Hence, the metric doesn't reflect whether a route has received an update, or if not (in case its attributes haven't changed).
  • Unregister() always ingests an UnregistryMessage, independently of the result of pool.Remove(). Hence, the metric doesn't reflect whether a route has been removed or not.
  • We should introduce an additional metric that reflects whether a whole route has been created or removed upon Register() / Unregister().
  • The structure of the Register() and Unregister() functions and its related private functions should be refactored, e.g.
    • Code / function structure should be the same
    • Logging should be done in the public function functionality
    • Improve code structure in general, make the long if-clauses in registry.go less complicated

Acceptance criteria

  • Have better metrics that reflect the actual result of registering and unregistering an endpoint, based on the PoolPutResult and a PoolRemoveResult.
    • Avoid breaking changes by using tags for the actual result in CaptureRegistryMessage and CaptureUnregistryMessage.
  • Fix the wrong UPDATED result in case of no-op registry update.
  • Refactor the code to mitigate the issues described above.

Related links

  • Draft Pull-Request Improve gorouter Route Registry Message Metrics gorouter#456: This was created before being able to use metric tags (introduced with support for Prometheus based metrics). This Pull-Request already addresses the issues described here, but needs a final refactoring to have no breaking change due to metrics renaming.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions