# Metrics

Bindplane can be configured to expose metrics using OpenTelemetry Protocol or Prometheus.\
See [Monitoring](https://docs.bindplane.com/configuration/bindplane/monitoring) for configuration details.

### Key Performance Indicators

Metrics denoted with "KPI" are key performance indicators. Bindplane administrators should pay close attention to KPIs to ensure Bindplane is operating normally.

### Event Bus

The Bindplane Event Bus is responsible for publishing and consuming messages between Bindplane components. When operating Bindplane in [High Availability](https://docs.bindplane.com/production-checklist/bindplane/high-availability), the Event Bus is responsible for sharing messages between Bindplane servers. Event Bus metrics can be used to gain visibility into the health of the Event Bus. A misbehaving Event Bus will cause issues with configuration rollouts, [Live Preview](https://docs.bindplane.com/configuration/bindplane/monitoring/broken-reference),\
and [Recent Telemetry Snapshots](https://docs.bindplane.com/configuration/bindplane/monitoring/broken-reference).

#### NATS Client

The NATS client is responsible for connecting to the NATS server and sending and receiving messages.

**nats.clientProducer (KPI)**

Number of times the NATS client producer handler was called. It is important to watch for the `error`\
attribute. Consistent producer errors indicate that the event bus is not functioning normally.

* Type: Counter
* Attributes
  * error: The error returned by the producer handler. One of "none", "max\_payload", "unknown".
  * client\_name: NATS client name.
  * cluster\_name: NATS cluster name.
  * subject: NATS subject name.

Example:

```metric
bindplane_nats_clientProducer_total{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 26
```

**nats.clientConsumer (KPI)**

Number of times the NATS client consumer handler was called. It is important to watch for the `error`\
attribute. Consistent consumer errors indicate that the event bus is not functioning normally.

* Type: Counter
* Attributes
  * error: The error returned by the consumer handler. One of "none", "max\_payload", "unknown".
  * client\_name: NATS client name.
  * cluster\_name: NATS cluster name.
  * subject: NATS subject name.

Example:

```metric
bindplane_nats_clientConsumer_total{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 41
```

**nats.slowConsumer (KPI)**

Number of slow messages processed by the NATS client. See [Slow Consumers](https://docs.nats.io/using-nats/developer/connecting/events/slow) for more information. When the slow consumer count is greater than 0, event bus messages are being dropped.

* Type: Counter
* Attributes
  * client\_name: NATS client name.
  * subject: NATS subject name.

Example:

```metric
bindplane_nats_slowconsumer_total{client_name="bindplane-ha-0",subject="bindplane-event-bus"} 2
```

**nats.clientProducerSize**

Size of the payload sent by the NATS client.

* Type: Histogram
* Attributes
  * client\_name
  * cluster\_name
  * error: The error returned by the producer handler. One of "none", "max\_payload", "unknown".
    * This error will match the error on `nats.clientProducer`.
  * subject: NATS subject name
* Buckets:
  * 0
  * 5
  * 10
  * 25
  * 50
  * 76
  * 100
  * 250
  * 500
  * 750
  * 1000
  * 2500
  * 5000
  * 7500
  * 10000

Example:

```metric
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="0"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="5"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="10"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="25"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="50"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="75"} 2
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="100"} 21
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="250"} 21
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="500"} 21
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="750"} 23
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="1000"} 23
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="2500"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="5000"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="7500"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="10000"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="+Inf"} 26
bindplane_nats_clientProducerSize_sum{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 7777
bindplane_nats_clientProducerSize_count{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 26
```

#### NATS Server

The NATS server is responsible for receiving and distributing messages to NATS clients.

**nats.eventbus.up**

Whether or not the NATS server is running. The server being up does not indicate whether or not it is\
connected to other NATS servers. This metrics is useful for determining if Bindplane instance is\
operating as a NATS server.

A 1 indicates the NATS server is running. A 0 indicates the NATS server is not running.

* Type: Gauge

**nats.server.active\_peers\_count (KPI)**

Number of active server peers connected to the NATS server. When operating Bindplane in [High Availability](https://docs.bindplane.com/production-checklist/bindplane/high-availability), all Bindplane instances should report the same number of active peers. e.g. A three node deployment should report `2` active peers.

If active peers is not consistent between Bindplane servers, a configuration or network issue is the\
likely culprit.

* Type: Gauge

Example:

```metric
bindplane_nats_server_active_peers_count{} 2
```

**nats.eventBusHealthRequestsSuccess**

Number of successful health requests made to `/status/v1/health` or `/status/v1/health/eventbus`.

* Type: Counter

**nats.eventBusHealthRequestsFailed (KPI)**

Number of failed health requests made to `/status/v1/health` or `/status/v1/health/eventbus`.

* Type: Counter

#### PubSub

**pubsub.messages**

Number of PubSub messages sent and received.

* Type: Counter
* Attributes
  * direction: One of `received`, `sent`.

Example:

```metric
bindplane_pubsub_messages_total{direction="received"} 31
bindplane_pubsub_messages_total{direction="sent"} 22
```

**pubsub.io**

Amount of PubSub data sent and received.

* Type: Counter
* Attributes
  * direction: One of `received`, `sent`.

Example:

```metric
bindplane_pubsub_io_bytes_total{direction="received"} 87190
bindplane_pubsub_io_bytes_total{direction="sent"} 78696
```

**pubsub.errors (KPI)**

Number of PubSub errors. PubSub errors indicates an issue with publishing or consuming messages\
from Google PubSub.

* Type: Counter

Example:

```metric
bindplane_pubsub_errors_total{} 2
```

### OpAMP

**agent.wait (KPI)**

Time spent waiting due to `maxConcurrency` configuration option.

This option prevent too many agents from re-connecting at the same time. During a Bindplane server restart, it is expected that `agent.wait`will increase temporarily as agent reconnect. If you experience significant `agent.wait` time, Bindplane is likely having an issue with agent connections.

High `agent.wait` times can result in agents appearing "disconnected" and degraded overall performance.

* Type: Histogram
* Buckets:
  * 0
  * 5
  * 10
  * 25
  * 50
  * 76
  * 100
  * 250
  * 500
  * 750
  * 1000
  * 2500
  * 5000
  * 7500
  * 10000

Example:

```metric
bindplane_agent_wait_milliseconds_bucket{le="0"} 24
bindplane_agent_wait_milliseconds_bucket{le="5"} 24
bindplane_agent_wait_milliseconds_bucket{le="10"} 24
bindplane_agent_wait_milliseconds_bucket{le="25"} 24
bindplane_agent_wait_milliseconds_bucket{le="50"} 24
bindplane_agent_wait_milliseconds_bucket{le="75"} 24
bindplane_agent_wait_milliseconds_bucket{le="100"} 24
bindplane_agent_wait_milliseconds_bucket{le="250"} 24
bindplane_agent_wait_milliseconds_bucket{le="500"} 24
bindplane_agent_wait_milliseconds_bucket{le="750"} 24
bindplane_agent_wait_milliseconds_bucket{le="1000"} 24
bindplane_agent_wait_milliseconds_bucket{le="2500"} 24
bindplane_agent_wait_milliseconds_bucket{le="5000"} 24
bindplane_agent_wait_milliseconds_bucket{le="7500"} 24
bindplane_agent_wait_milliseconds_bucket{le="10000"} 24
bindplane_agent_wait_milliseconds_bucket{le="+Inf"} 24
bindplane_agent_wait_milliseconds_sum{} 0
bindplane_agent_wait_milliseconds_count{} 24
```

**agent.connecting (KPI)**

Number of times agents have attempted to connect. The `result` attribute is critical to understanding\
if agents are connecting properly.

* Type: Counter
* Attributes
  * result
    * `conflict`: The agent connection was rejected because an agent with the same agent\_id is already connected.
    * `connected`: The agent connected successfully.
    * `disconnected`: The agent disconnected.
    * `error`: An error occurred during agent connection.
    * `limited`: The agent connection was rejected because the agent's account has reached the maximum allowed number of agents.
    * `unauthorized`: The agent connection was rejected because an account matching the agent's secret-key was not found.

Example:

```metric
bindplane_agent_connecting_total{result="connected"} 7
bindplane_agent_connecting_total{result="disconnected"} 3
```

**agent.configure**

Number of times agents have been configured (push).

* Type: Counter
* Attributes
  * result
    * `configuring`: Successfully set agent status to `configuring` during agent configuration.
    * `disconnected`: Failed to push update to agent because it is disconnected.
    * `error` An error occurred during agent configuration.
    * `readonly`: Agent does not support remote configuration.

Example:

```metric
bindplane_agent_configure_total{result="configuring"} 3
bindplane_agent_configure_total{result="error"} 3
```

**agent.verify**

Number of times agents have been verified (pull).

* Type: Counter
* Attributes
  * result:
    * `configuring`: Agent status was changed to `configuring` because it was not running the correct configuration.
    * `error` An error occurred during agent configuration verification.
    * `missing`
    * `readonly`: Agent does not support remote configuration.
    * `validated`: Agent update skipped because agent already has the correct configuration or does not have a configuration assigned.
    * `validated-hash`: Agent configuration hash matches the hash previously pushed to the agent.
    * `waiting`: Agent configuration cannot be validated because the agent is applying the configuration.

Example:

```metric
bindplane_agent_verify_total{result="configuring"} 3
bindplane_agent_verify_total{result="validated"} 23
```

**agent.upgrade**

Number of times agents have been upgraded.

* Type: Counter
* Attributes
  * `error`: An error occurred while upgrading an agent.
  * `upgrading`: An upgrade request occurred.

Example:

```metric
bindplane_agent_upgrade_total{result="upgrading"} 1
```

**agent.report**

Number of agent snapshot requests.

* Type: Counter
* Attributes
  * result:
    * `disconnected`: Snapshot request failed because the agent is disconnected.
    * `error`: An error occurred while requesting a snapshot from an agent.
    * `sent`: Snapshot request was successfully sent to the agent.

Example:

```metric
bindplane_agent_report_total{result="sent"} 133
bindplane_agent_report_total{result="disconnected"} 1
bindplane_agent_report_total{result="error"} 2
```

**agent\_messages**

Number of agent messages received.

* Type: Counter
* Attributes
  * `components`: A list of agent components.

Example:

```metric
bindplane_agent_messages_total{components="[\"AgentDescription\",\"EffectiveConfig\",\"RemoteConfigStatus\",\"PackageStatuses\"]"} 17
bindplane_agent_messages_total{components="[\"CustomMessage\"]"} 564
bindplane_agent_messages_total{components="[\"EffectiveConfig\",\"RemoteConfigStatus\"]"} 1
bindplane_agent_messages_total{components="[\"EffectiveConfig\"]"} 2
bindplane_agent_messages_total{components="[\"RemoteConfigStatus\"]"} 2
```

**agent.heartbeat**

Number of agent heartbeats received.

* Type: Counter
* Attributes

Example:

```metric
bindplane_agent_heartbeat_total{} 52
```

**connected\_agents (KPI)**

Number of connected agents. If `connected_agents` is less than the total number of agents, it is possible some agents are experiencing connectivity issues. This metric should be tracked with the `agent.connecting`metric.

* Type: Gauge

**custom\_messages.processed**

Number of custom messages that have been processed.

* Type: Counter
* Attributes
  * `message_capability`: Message capability.
  * `message_type`: Message type.

Example:

```metric
bindplane_custom_messages_processed_total{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 660
```

**custom\_messages.process\_time**

Time spent processing custom messages.

* Type: Histogram
* Attributes
  * `message_capability`: Message capability.
  * `message_type`: Message type.
* Buckets:
  * 0
  * 5
  * 10
  * 25
  * 50
  * 76
  * 100
  * 250
  * 500
  * 750
  * 1000
  * 2500
  * 5000
  * 7500
  * 10000

Example:

```metric
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="0"} 0
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="5"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="10"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="25"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="50"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="75"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="100"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="250"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="500"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="750"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="1000"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="2500"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="5000"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="7500"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="10000"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="+Inf"} 740
bindplane_custom_messages_process_time_milliseconds_sum{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 52.201175000000056
bindplane_custom_messages_process_time_milliseconds_count{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 740
```

**throughput\_metrics.processed**

Total amount of throughput metric datapoints that have been processed and batched.

* Type: Counter
* Attributes
  * `message_capability`: Message capability.
  * `message_type`: Message type.

Example:

```metric
bindplane_custom_messages_processed_total{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 764
```

**measurements.process\_time**

{% hint style="danger" %}
**WARNING**

This metric is deprecated. Use custom\_messages.process\_time instead.
{% endhint %}

Amount of time spent processing measurements.

* Type: Histogram
* Attributes
* Buckets:
  * 0
  * 5
  * 10
  * 25
  * 50
  * 76
  * 100
  * 250
  * 500
  * 750
  * 1000
  * 2500
  * 5000
  * 7500
  * 10000

**throughput\_metrics.processed**

Total amount of throughput metric datapoints that have been processed and batched.

* Type: Counter

Example:

```metric
bindplane_throughput_metrics_processed_total{} 92
```

### Web Server

**requests (KPI)**

Number of HTTP requests. The `status` attribute will include the HTTP status code. It is important\
to monitor for 4xx and 5xx status codes. An excessive number of 4xx status codes could indicate\
collector authentication issues. Any number of 5xx status codes is unexpected, and could indicate a\
configuration issue or bug within Bindplane.

* Type: Counter
* Attributes
  * `method`: Request method.
  * `status`: Request status.
  * `url`: Request URL path.

Example:

```metric
bindplane_requests_ratio_total{method="GET",status="200",url="/v1/opamp"} 7
bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 212
bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2
```

**request\_duration**

Time taken to process the request in seconds.

* Type: Histogram
* Attributes
  * `method`: Request method.
  * `status`: Request status.
  * `url`: Request URL path.
* Unit: Seconds
* Buckets:
  * 0.1
  * 0.5
  * 1
  * 2
  * 5
  * 10

Example:

```metric
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="0.1"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="0.5"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="1"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="2"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="5"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="10"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="+Inf"} 7
bindplane_request_duration_seconds_sum{method="GET",status="200",url="/v1/opamp"} 0.047771097000000005
bindplane_request_duration_seconds_count{method="GET",status="200",url="/v1/opamp"} 7
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0.1"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0.5"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="1"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="2"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="+Inf"} 224
bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 0.027173258000000006
bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0.1"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0.5"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="1"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="2"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="+Inf"} 1
bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 0.000218934
bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0.1"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0.5"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="1"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="2"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="+Inf"} 2
bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 0.017959873
bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2
```

**request\_size**

Size of the request received.

* Type: Histogram
* Attributes
  * `method`: Request method.
  * `status`: Request status.
  * `url`: Request URL path.
* Unit: Bytes
* Buckets:
  * 0
  * 5
  * 10
  * 25
  * 50
  * 76
  * 100
  * 250
  * 500
  * 750
  * 1000
  * 2500
  * 5000
  * 7500
  * 10000

Example:

```metric
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="0"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="5"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="10"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="25"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="50"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="75"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="100"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="250"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="500"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="750"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="1000"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="2500"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="5000"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="7500"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="10000"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="+Inf"} 7
bindplane_request_size_sum{method="GET",status="200",url="/v1/opamp"} 2512
bindplane_request_size_count{method="GET",status="200",url="/v1/opamp"} 7
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="25"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="50"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="75"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="100"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="250"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="500"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="750"} 4
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="1000"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="2500"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5000"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="7500"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10000"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="+Inf"} 228
bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 175092
bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="25"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="50"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="75"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="100"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="250"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="750"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="1000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="2500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="7500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="+Inf"} 1
bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 410
bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="25"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="50"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="75"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="100"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="250"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="500"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="750"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="1000"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="2500"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5000"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="7500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="+Inf"} 2
bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 17732
bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2
```

### Store

**eventbus.latency**

Time between sending an event to the event bus and the handler receiving it.

* Type: Histogram
* Attributes
  * event: One of `received` or `handled`.
  * handler: The component handling the event, one of `manager` or `graphql`.
  * type: The event type is always `updates`. Other values may exist in the future.
* Buckets:
  * 0
  * 5
  * 10
  * 25
  * 50
  * 76
  * 100
  * 250
  * 500
  * 750
  * 1000
  * 2500
  * 5000
  * 7500
  * 10000

Example:

```metric
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="0"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="5"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="10"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="25"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="50"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="75"} 4
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="100"} 9
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="250"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="500"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="750"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="1000"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="2500"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="5000"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="7500"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="10000"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="+Inf"} 15
bindplane_eventbus_latency_milliseconds_sum{event="handled",handler="graphql",type="updates"} 1508
bindplane_eventbus_latency_milliseconds_count{event="handled",handler="graphql",type="updates"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="0"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="5"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="10"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="25"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="50"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="75"} 5
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="100"} 12
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="250"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="500"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="750"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="1000"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="2500"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="5000"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="7500"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="10000"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="+Inf"} 22
bindplane_eventbus_latency_milliseconds_sum{event="received",handler="graphql",type="updates"} 2312
bindplane_eventbus_latency_milliseconds_count{event="received",handler="graphql",type="updates"} 22
```

**store.updateRollout**

Number of times the store `updateRollout` method was called.

* Type: Counter

Example:

```metric
bindplane_store_updateRollout_total{} 5
```

#### Postgres

**postgres.up (KPI)**

Whether Bindplane has a connection to the Postgres database. A value of 1 indicates a successful connection. A value of 0 indicates Bindplane is unable to connect to the Postgres database.

* Type: Gauge

**wait\_count**

Number of times a query had to wait for a connection.

* Type: Counter

Example:

```metric
bindplane_wait_count_total{} 3
```

**wait\_time (KPI)**

Total time spent waiting for connections. If `wait_time` is consistently above 0, it could mean Bindplane's Postgres max connections configuration option is set too low or Postgres is experiencing performance issues.

* Type: Counter
* Unit: milliseconds

Example:

```metric
bindplane_wait_time_milliseconds_total{} 0
```

**active\_connections (KPI)**

Number of open active connections. If `active_connections` is consistently 100% of the configured max connections, Bindplane may be experiencing performance issues. Generally, Bindplane's max connections should not exceed 100 (default). Increasing max connections might mask an underlying issue and is not recommended.

* Type: Gauge

Example:

```metric
bindplane_active_connections{} 0
```

**idle\_connections**

Number of open idle connections.

* Type: Gauge

Example:

```metric
bindplane_idle_connections{} 0
```

#### Prometheus

**prometheus.up (KPI)**

Whether Bindplane has a connection to the Prometheus instance. A value of 1 indicates a successful connection. A value of 0 indicates Bindplane is unable to connect to the Prometheus instance.

* Type: Gauge

**remote\_write.time**

Time taken to upload a batch to Prometheus via remote write.

* Type: Histogram
* Unit: Milliseconds
* Buckets:
  * 0
  * 100
  * 200
  * 300
  * 400
  * 500
  * 750
  * 1000
  * 1500
  * 2000
  * 3000
  * 5000

Example:

```metric
bindplane_remote_write_time_milliseconds_bucket{le="0"} 0
bindplane_remote_write_time_milliseconds_bucket{le="100"} 558
bindplane_remote_write_time_milliseconds_bucket{le="200"} 558
bindplane_remote_write_time_milliseconds_bucket{le="300"} 558
bindplane_remote_write_time_milliseconds_bucket{le="400"} 558
bindplane_remote_write_time_milliseconds_bucket{le="500"} 558
bindplane_remote_write_time_milliseconds_bucket{le="750"} 558
bindplane_remote_write_time_milliseconds_bucket{le="1000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="1500"} 558
bindplane_remote_write_time_milliseconds_bucket{le="2000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="3000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="5000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="+Inf"} 558
bindplane_remote_write_time_milliseconds_sum{} 269.38973300000004
bindplane_remote_write_time_milliseconds_count{} 558
```

**remote\_write.samples.count**

Number of samples pushed to the Prometheus instance.

* Type: Counter

Example:

```metric
bindplane_remote_write_samples_count_total{} 2112
```

### Cache

**cache.get**

Number of cache requests.

* Type: Counter
* Attributes
  * `cache`: The cache instance
  * `result`: `hit` or `miss`

Example:

```metric
bindplane_cache_get_total{cache="license",result="miss"} 3
bindplane_cache_get_total{cache="resourceFinder",result="hit"} 39
bindplane_cache_get_total{cache="resourceFinder",result="miss"} 3
bindplane_cache_get_total{cache="secretKey",result="hit"} 15
bindplane_cache_get_total{cache="secretKey",result="miss"} 1
```

### Transform Agent

**transformagent.up (KPI)**

Whether Bindplane has a connection to the Transform Agent. A value of 1 indicates a successful connection. A value of 0 indicates Bindplane is unable to connect to the Transform Agent.

* Type: Gauge

**transformagent.failedRequest**

Number of failed requests to the Transform Agent.

* Type: Counter
