Skip to content

Metrics

Pavel Perestoronin edited this page Jul 11, 2021 · 2 revisions

Metrics

Exporters

Graphite

Our own implementation, which uses plaintext graphite protocol.

counter - cumulative value, always increments
gauge - uses last set value
histogram (since metrics 0.13.0, timing was used before) - sends average value (uses all values, collected since previos metrics sending)

Prometheus

Uses this implementation of exporter.

Data

Client

client - metrics, which are related to FORCE operations (FORCE operation is operation requested by one node to send data to other node (this operation is performed during data replication))

Metrics

  • put_count
  • put_error_count
  • put_timer
  • get_count
  • get_error_count
  • get_timer
  • exist_count
  • exist_error_count
  • exist_timer

Example

Imagine that you have cluster with 2 nodes:

  • node1
  • node2

Data is not replicated (quorum = 1) but sharded (shard is defined by key % 2 operation). You use only node1 to put data.
You put 100 records (from 1 to 100 inclusive), so node1 will put 50 records locally and send 50 requests to put data to node2 (and node2 put_count will be 50 due to these requests (or if errors occur, put_count would be less, and put_error_count = 50 - put_count for this example)).
All client metrics for node1 will be equal to 0.
For get/exist operation metrics are counted in the same way.

Grinder

grinder - cluster requests metrics related with current node (note: put operation is failed (put_error_count increments) only if remote alien put operation failed AND local alien put operation also failed)

Metrics

  • put_count
  • put_error_count
  • put_timer
  • get_count
  • get_error_count
  • get_timer
  • exist_count
  • exist_error_count
  • exist_timer

Example

Imagine that you have cluster with 2 nodes:

  • node1
  • node2

Data is not replicated (quorum = 1) but sharded (shard is defined by key % 2 operation). You use only node1 to put data.
You put 100 records (from 1 to 100 inclusive), so node1 will have 100 cluster put operations and grinder.put_count will be equal to 100 (in case of errors will be equal to 100 - [amount of errors]).
node2 will have grinder.put_count equal to 0.
For get/exist operation metrics are counted in the same way.

Pearl

pearl - counts ALL disk operations, related to current node

Metrics

  • put_count
  • put_error_count
  • put_timer
  • get_count
  • get_error_count
  • get_timer

Example

Imagine that you have cluster with 2 nodes:

  • node1
  • node2

Data is not replicated (quorum = 1) but sharded (shard is defined by key % 2 operation) BUT node2 is not accessible. You use only node1 to put data.
You put 100 records (from 1 to 100 inclusive), so node1 will put 50 keys locally and 50 keys in local alien (due to node2 inaccessibility). There are 50 + 50 = 100 put operations on disk, so pearl.put_count = 100 (in case of errors will be equal to 100 - [amount of errors]).

For get operations these metrics are more complicated.

Holder in bob is an instance of Pearl storage, which is responsible for some concrete timestamp. If you already have holders for 10 timestamps (10 holders), so one get (any) operation will return:

  • get_count = 0, get_error_count = 10, if data doesn't exist in bob;
  • get_count = 1, get_error_count = e (e in [0, 9], depends on how many holders would be scanned before record will be found), if data exists in storage.

link_manager

One gauge value (nodes_number), which in every moment counts amount of accessible nodes for current node.

backend

backend - metrics, which describe node backend state and stats

Metrics

Metric Description
backend_state 0 - starting, 1 - started
blob_count (blob files count, doesn't includes aliens)
alien_count (alien blob files count)
index_memory RAM occupied by indexes
active_disks active disks amount
disks.diskX describe state of diskX (0 - not ready, 2 - initialized, 3 - works)
Clone this wiki locally