Metrics

Bindplane can be configured to expose metrics using OpenTelemetry Protocol or Prometheus. See Monitoring for configuration details.

Key Performance Indicators

Metrics denoted with "KPI" are key performance indicators. Bindplane administrators should pay close attention to KPIs to ensure Bindplane is operating normally.

Event Bus

The Bindplane Event Bus is responsible for publishing and consuming messages between Bindplane components. When operating Bindplane in High Availability, the Event Bus is responsible for sharing messages between Bindplane servers. Event Bus metrics can be used to gain visibility into the health of the Event Bus. A misbehaving Event Bus will cause issues with configuration rollouts, Live Preview, and Recent Telemetry Snapshots.

NATS Client

The NATS client is responsible for connecting to the NATS server and sending and receiving messages.

nats.clientProducer (KPI)

Number of times the NATS client producer handler was called. It is important to watch for the error attribute. Consistent producer errors indicate that the event bus is not functioning normally.

  • Type: Counter

  • Attributes

    • error: The error returned by the producer handler. One of "none", "max_payload", "unknown".

    • client_name: NATS client name.

    • cluster_name: NATS cluster name.

    • subject: NATS subject name.

Example:

bindplane_nats_clientProducer_total{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 26

nats.clientConsumer (KPI)

Number of times the NATS client consumer handler was called. It is important to watch for the error attribute. Consistent consumer errors indicate that the event bus is not functioning normally.

  • Type: Counter

  • Attributes

    • error: The error returned by the consumer handler. One of "none", "max_payload", "unknown".

    • client_name: NATS client name.

    • cluster_name: NATS cluster name.

    • subject: NATS subject name.

Example:

bindplane_nats_clientConsumer_total{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 41

nats.slowConsumer (KPI)

Number of slow messages processed by the NATS client. See Slow Consumers for more information. When the slow consumer count is greater than 0, event bus messages are being dropped.

  • Type: Counter

  • Attributes

    • client_name: NATS client name.

    • subject: NATS subject name.

Example:

bindplane_nats_slowconsumer_total{client_name="bindplane-ha-0",subject="bindplane-event-bus"} 2

nats.clientProducerSize

Size of the payload sent by the NATS client.

  • Type: Histogram

  • Attributes

    • client_name

    • cluster_name

    • error: The error returned by the producer handler. One of "none", "max_payload", "unknown".

      • This error will match the error on nats.clientProducer.

    • subject: NATS subject name

  • Buckets:

    • 0

    • 5

    • 10

    • 25

    • 50

    • 76

    • 100

    • 250

    • 500

    • 750

    • 1000

    • 2500

    • 5000

    • 7500

    • 10000

Example:

bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="0"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="5"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="10"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="25"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="50"} 0
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="75"} 2
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="100"} 21
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="250"} 21
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="500"} 21
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="750"} 23
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="1000"} 23
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="2500"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="5000"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="7500"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="10000"} 26
bindplane_nats_clientProducerSize_bucket{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus",le="+Inf"} 26
bindplane_nats_clientProducerSize_sum{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 7777
bindplane_nats_clientProducerSize_count{client_name="bindplane-ha-0",cluster_name="bindplane",error="none",subject="bindplane-event-bus"} 26

NATS Server

The NATS server is responsible for receiving and distributing messages to NATS clients.

nats.eventbus.up

Whether or not the NATS server is running. The server being up does not indicate whether or not it is connected to other NATS servers. This metrics is useful for determining if Bindplane instance is operating as a NATS server.

A 1 indicates the NATS server is running. A 0 indicates the NATS server is not running.

  • Type: Gauge

nats.server.active_peers_count (KPI)

Number of active server peers connected to the NATS server. When operating Bindplane in High Availability, all Bindplane instances should report the same number of active peers. e.g. A three node deployment should report 2 active peers.

If active peers is not consistent between Bindplane servers, a configuration or network issue is the likely culprit.

  • Type: Gauge

Example:

bindplane_nats_server_active_peers_count{} 2

nats.eventBusHealthRequestsSuccess

Number of successful health requests made to /status/v1/health or /status/v1/health/eventbus.

  • Type: Counter

nats.eventBusHealthRequestsFailed (KPI)

Number of failed health requests made to /status/v1/health or /status/v1/health/eventbus.

  • Type: Counter

PubSub

pubsub.messages

Number of PubSub messages sent and received.

  • Type: Counter

  • Attributes

    • direction: One of received, sent.

Example:

bindplane_pubsub_messages_total{direction="received"} 31
bindplane_pubsub_messages_total{direction="sent"} 22

pubsub.io

Amount of PubSub data sent and received.

  • Type: Counter

  • Attributes

    • direction: One of received, sent.

Example:

bindplane_pubsub_io_bytes_total{direction="received"} 87190
bindplane_pubsub_io_bytes_total{direction="sent"} 78696

pubsub.errors (KPI)

Number of PubSub errors. PubSub errors indicates an issue with publishing or consuming messages from Google PubSub.

  • Type: Counter

Example:

bindplane_pubsub_errors_total{} 2

OpAMP

agent.wait (KPI)

Time spent waiting due to maxConcurrency configuration option.

This option prevent too many agents from re-connecting at the same time. During a Bindplane server restart, it is expected that agent.waitwill increase temporarily as agent reconnect. If you experience significant agent.wait time, Bindplane is likely having an issue with agent connections.

High agent.wait times can result in agents appearing "disconnected" and degraded overall performance.

  • Type: Histogram

  • Buckets:

    • 0

    • 5

    • 10

    • 25

    • 50

    • 76

    • 100

    • 250

    • 500

    • 750

    • 1000

    • 2500

    • 5000

    • 7500

    • 10000

Example:

bindplane_agent_wait_milliseconds_bucket{le="0"} 24
bindplane_agent_wait_milliseconds_bucket{le="5"} 24
bindplane_agent_wait_milliseconds_bucket{le="10"} 24
bindplane_agent_wait_milliseconds_bucket{le="25"} 24
bindplane_agent_wait_milliseconds_bucket{le="50"} 24
bindplane_agent_wait_milliseconds_bucket{le="75"} 24
bindplane_agent_wait_milliseconds_bucket{le="100"} 24
bindplane_agent_wait_milliseconds_bucket{le="250"} 24
bindplane_agent_wait_milliseconds_bucket{le="500"} 24
bindplane_agent_wait_milliseconds_bucket{le="750"} 24
bindplane_agent_wait_milliseconds_bucket{le="1000"} 24
bindplane_agent_wait_milliseconds_bucket{le="2500"} 24
bindplane_agent_wait_milliseconds_bucket{le="5000"} 24
bindplane_agent_wait_milliseconds_bucket{le="7500"} 24
bindplane_agent_wait_milliseconds_bucket{le="10000"} 24
bindplane_agent_wait_milliseconds_bucket{le="+Inf"} 24
bindplane_agent_wait_milliseconds_sum{} 0
bindplane_agent_wait_milliseconds_count{} 24

agent.connecting (KPI)

Number of times agents have attempted to connect. The result attribute is critical to understanding if agents are connecting properly.

  • Type: Counter

  • Attributes

    • result

      • conflict: The agent connection was rejected because an agent with the same agent_id is already connected.

      • connected: The agent connected successfully.

      • disconnected: The agent disconnected.

      • error: An error occurred during agent connection.

      • limited: The agent connection was rejected because the agent's account has reached the maximum allowed number of agents.

      • unauthorized: The agent connection was rejected because an account matching the agent's secret-key was not found.

Example:

bindplane_agent_connecting_total{result="connected"} 7
bindplane_agent_connecting_total{result="disconnected"} 3

agent.configure

Number of times agents have been configured (push).

  • Type: Counter

  • Attributes

    • result

      • configuring: Successfully set agent status to configuring during agent configuration.

      • disconnected: Failed to push update to agent because it is disconnected.

      • error An error occurred during agent configuration.

      • readonly: Agent does not support remote configuration.

Example:

bindplane_agent_configure_total{result="configuring"} 3
bindplane_agent_configure_total{result="error"} 3

agent.verify

Number of times agents have been verified (pull).

  • Type: Counter

  • Attributes

    • result:

      • configuring: Agent status was changed to configuring because it was not running the correct configuration.

      • error An error occurred during agent configuration verification.

      • missing

      • readonly: Agent does not support remote configuration.

      • validated: Agent update skipped because agent already has the correct configuration or does not have a configuration assigned.

      • validated-hash: Agent configuration hash matches the hash previously pushed to the agent.

      • waiting: Agent configuration cannot be validated because the agent is applying the configuration.

Example:

bindplane_agent_verify_total{result="configuring"} 3
bindplane_agent_verify_total{result="validated"} 23

agent.upgrade

Number of times agents have been upgraded.

  • Type: Counter

  • Attributes

    • error: An error occurred while upgrading an agent.

    • upgrading: An upgrade request occurred.

Example:

bindplane_agent_upgrade_total{result="upgrading"} 1

agent.report

Number of agent snapshot requests.

  • Type: Counter

  • Attributes

    • result:

      • disconnected: Snapshot request failed because the agent is disconnected.

      • error: An error occurred while requesting a snapshot from an agent.

      • sent: Snapshot request was successfully sent to the agent.

Example:

bindplane_agent_report_total{result="sent"} 133
bindplane_agent_report_total{result="disconnected"} 1
bindplane_agent_report_total{result="error"} 2

agent_messages

Number of agent messages received.

  • Type: Counter

  • Attributes

    • components: A list of agent components.

Example:

bindplane_agent_messages_total{components="[\"AgentDescription\",\"EffectiveConfig\",\"RemoteConfigStatus\",\"PackageStatuses\"]"} 17
bindplane_agent_messages_total{components="[\"CustomMessage\"]"} 564
bindplane_agent_messages_total{components="[\"EffectiveConfig\",\"RemoteConfigStatus\"]"} 1
bindplane_agent_messages_total{components="[\"EffectiveConfig\"]"} 2
bindplane_agent_messages_total{components="[\"RemoteConfigStatus\"]"} 2

agent.heartbeat

Number of agent heartbeats received.

  • Type: Counter

  • Attributes

Example:

bindplane_agent_heartbeat_total{} 52

connected_agents (KPI)

Number of connected agents. If connected_agents is less than the total number of agents, it is possible some agents are experiencing connectivity issues. This metric should be tracked with the agent.connectingmetric.

  • Type: Gauge

custom_messages.processed

Number of custom messages that have been processed.

  • Type: Counter

  • Attributes

    • message_capability: Message capability.

    • message_type: Message type.

Example:

bindplane_custom_messages_processed_total{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 660

custom_messages.process_time

Time spent processing custom messages.

  • Type: Histogram

  • Attributes

    • message_capability: Message capability.

    • message_type: Message type.

  • Buckets:

    • 0

    • 5

    • 10

    • 25

    • 50

    • 76

    • 100

    • 250

    • 500

    • 750

    • 1000

    • 2500

    • 5000

    • 7500

    • 10000

Example:

bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="0"} 0
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="5"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="10"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="25"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="50"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="75"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="100"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="250"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="500"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="750"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="1000"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="2500"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="5000"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="7500"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="10000"} 740
bindplane_custom_messages_process_time_milliseconds_bucket{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements",le="+Inf"} 740
bindplane_custom_messages_process_time_milliseconds_sum{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 52.201175000000056
bindplane_custom_messages_process_time_milliseconds_count{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 740

throughput_metrics.processed

Total amount of throughput metric datapoints that have been processed and batched.

  • Type: Counter

  • Attributes

    • message_capability: Message capability.

    • message_type: Message type.

Example:

bindplane_custom_messages_processed_total{message_capability="com.bindplane.measurements.v1",message_type="reportMeasurements"} 764

measurements.process_time

Amount of time spent processing measurements.

  • Type: Histogram

  • Attributes

  • Buckets:

    • 0

    • 5

    • 10

    • 25

    • 50

    • 76

    • 100

    • 250

    • 500

    • 750

    • 1000

    • 2500

    • 5000

    • 7500

    • 10000

throughput_metrics.processed

Total amount of throughput metric datapoints that have been processed and batched.

  • Type: Counter

Example:

bindplane_throughput_metrics_processed_total{} 92

Web Server

requests (KPI)

Number of HTTP requests. The status attribute will include the HTTP status code. It is important to monitor for 4xx and 5xx status codes. An excessive number of 4xx status codes could indicate collector authentication issues. Any number of 5xx status codes is unexpected, and could indicate a configuration issue or bug within Bindplane.

  • Type: Counter

  • Attributes

    • method: Request method.

    • status: Request status.

    • url: Request URL path.

Example:

bindplane_requests_ratio_total{method="GET",status="200",url="/v1/opamp"} 7
bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 212
bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
bindplane_requests_ratio_total{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2

request_duration

Time taken to process the request in seconds.

  • Type: Histogram

  • Attributes

    • method: Request method.

    • status: Request status.

    • url: Request URL path.

  • Unit: Seconds

  • Buckets:

    • 0.1

    • 0.5

    • 1

    • 2

    • 5

    • 10

Example:

bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="0.1"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="0.5"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="1"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="2"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="5"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="10"} 7
bindplane_request_duration_seconds_bucket{method="GET",status="200",url="/v1/opamp",le="+Inf"} 7
bindplane_request_duration_seconds_sum{method="GET",status="200",url="/v1/opamp"} 0.047771097000000005
bindplane_request_duration_seconds_count{method="GET",status="200",url="/v1/opamp"} 7
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0.1"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0.5"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="1"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="2"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="+Inf"} 224
bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 0.027173258000000006
bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 224
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0.1"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0.5"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="1"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="2"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="+Inf"} 1
bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 0.000218934
bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0.1"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0.5"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="1"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="2"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10"} 2
bindplane_request_duration_seconds_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="+Inf"} 2
bindplane_request_duration_seconds_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 0.017959873
bindplane_request_duration_seconds_count{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2

request_size

Size of the request received.

  • Type: Histogram

  • Attributes

    • method: Request method.

    • status: Request status.

    • url: Request URL path.

  • Unit: Bytes

  • Buckets:

    • 0

    • 5

    • 10

    • 25

    • 50

    • 76

    • 100

    • 250

    • 500

    • 750

    • 1000

    • 2500

    • 5000

    • 7500

    • 10000

Example:

bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="0"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="5"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="10"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="25"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="50"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="75"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="100"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="250"} 0
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="500"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="750"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="1000"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="2500"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="5000"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="7500"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="10000"} 7
bindplane_request_size_bucket{method="GET",status="200",url="/v1/opamp",le="+Inf"} 7
bindplane_request_size_sum{method="GET",status="200",url="/v1/opamp"} 2512
bindplane_request_size_count{method="GET",status="200",url="/v1/opamp"} 7
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="0"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="25"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="50"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="75"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="100"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="250"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="500"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="750"} 4
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="1000"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="2500"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="5000"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="7500"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="10000"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics",le="+Inf"} 228
bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 175092
bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/health/v1/metrics"} 228
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="0"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="25"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="50"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="75"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="100"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="250"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="750"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="1000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="2500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="5000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="7500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="10000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/logs",le="+Inf"} 1
bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 410
bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/snapshots/logs"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="0"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="25"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="50"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="75"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="100"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="250"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="500"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="750"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="1000"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="2500"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="5000"} 0
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="7500"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="10000"} 1
bindplane_request_size_bucket{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics",le="+Inf"} 2
bindplane_request_size_sum{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 17732
bindplane_request_size_count{method="POST",status="200",url="/v1/agents/:id/snapshots/metrics"} 2

Store

eventbus.latency

Time between sending an event to the event bus and the handler receiving it.

  • Type: Histogram

  • Attributes

    • event: One of received or handled.

    • handler: The component handling the event, one of manager or graphql.

    • type: The event type is always updates. Other values may exist in the future.

  • Buckets:

    • 0

    • 5

    • 10

    • 25

    • 50

    • 76

    • 100

    • 250

    • 500

    • 750

    • 1000

    • 2500

    • 5000

    • 7500

    • 10000

Example:

bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="0"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="5"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="10"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="25"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="50"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="75"} 4
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="100"} 9
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="250"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="500"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="750"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="1000"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="2500"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="5000"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="7500"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="10000"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="handled",handler="graphql",type="updates",le="+Inf"} 15
bindplane_eventbus_latency_milliseconds_sum{event="handled",handler="graphql",type="updates"} 1508
bindplane_eventbus_latency_milliseconds_count{event="handled",handler="graphql",type="updates"} 15
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="0"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="5"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="10"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="25"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="50"} 0
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="75"} 5
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="100"} 12
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="250"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="500"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="750"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="1000"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="2500"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="5000"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="7500"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="10000"} 22
bindplane_eventbus_latency_milliseconds_bucket{event="received",handler="graphql",type="updates",le="+Inf"} 22
bindplane_eventbus_latency_milliseconds_sum{event="received",handler="graphql",type="updates"} 2312
bindplane_eventbus_latency_milliseconds_count{event="received",handler="graphql",type="updates"} 22

store.updateRollout

Number of times the store updateRollout method was called.

  • Type: Counter

Example:

bindplane_store_updateRollout_total{} 5

Postgres

postgres.up (KPI)

Whether Bindplane has a connection to the Postgres database. A value of 1 indicates a successful connection. A value of 0 indicates Bindplane is unable to connect to the Postgres database.

  • Type: Gauge

wait_count

Number of times a query had to wait for a connection.

  • Type: Counter

Example:

bindplane_wait_count_total{} 3

wait_time (KPI)

Total time spent waiting for connections. If wait_time is consistently above 0, it could mean Bindplane's Postgres max connections configuration option is set too low or Postgres is experiencing performance issues.

  • Type: Counter

  • Unit: milliseconds

Example:

bindplane_wait_time_milliseconds_total{} 0

active_connections (KPI)

Number of open active connections. If active_connections is consistently 100% of the configured max connections, Bindplane may be experiencing performance issues. Generally, Bindplane's max connections should not exceed 100 (default). Increasing max connections might mask an underlying issue and is not recommended.

  • Type: Gauge

Example:

bindplane_active_connections{} 0

idle_connections

Number of open idle connections.

  • Type: Gauge

Example:

bindplane_idle_connections{} 0

Prometheus

prometheus.up (KPI)

Whether Bindplane has a connection to the Prometheus instance. A value of 1 indicates a successful connection. A value of 0 indicates Bindplane is unable to connect to the Prometheus instance.

  • Type: Gauge

remote_write.time

Time taken to upload a batch to Prometheus via remote write.

  • Type: Histogram

  • Unit: Milliseconds

  • Buckets:

    • 0

    • 100

    • 200

    • 300

    • 400

    • 500

    • 750

    • 1000

    • 1500

    • 2000

    • 3000

    • 5000

Example:

bindplane_remote_write_time_milliseconds_bucket{le="0"} 0
bindplane_remote_write_time_milliseconds_bucket{le="100"} 558
bindplane_remote_write_time_milliseconds_bucket{le="200"} 558
bindplane_remote_write_time_milliseconds_bucket{le="300"} 558
bindplane_remote_write_time_milliseconds_bucket{le="400"} 558
bindplane_remote_write_time_milliseconds_bucket{le="500"} 558
bindplane_remote_write_time_milliseconds_bucket{le="750"} 558
bindplane_remote_write_time_milliseconds_bucket{le="1000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="1500"} 558
bindplane_remote_write_time_milliseconds_bucket{le="2000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="3000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="5000"} 558
bindplane_remote_write_time_milliseconds_bucket{le="+Inf"} 558
bindplane_remote_write_time_milliseconds_sum{} 269.38973300000004
bindplane_remote_write_time_milliseconds_count{} 558

remote_write.samples.count

Number of samples pushed to the Prometheus instance.

  • Type: Counter

Example:

bindplane_remote_write_samples_count_total{} 2112

Cache

cache.get

Number of cache requests.

  • Type: Counter

  • Attributes

    • cache: The cache instance

    • result: hit or miss

Example:

bindplane_cache_get_total{cache="license",result="miss"} 3
bindplane_cache_get_total{cache="resourceFinder",result="hit"} 39
bindplane_cache_get_total{cache="resourceFinder",result="miss"} 3
bindplane_cache_get_total{cache="secretKey",result="hit"} 15
bindplane_cache_get_total{cache="secretKey",result="miss"} 1

Transform Agent

transformagent.up (KPI)

Whether Bindplane has a connection to the Transform Agent. A value of 1 indicates a successful connection. A value of 0 indicates Bindplane is unable to connect to the Transform Agent.

  • Type: Gauge

transformagent.failedRequest

Number of failed requests to the Transform Agent.

  • Type: Counter

Last updated

Was this helpful?