For the complete documentation index, see llms.txt. This page is also available as Markdown.

Postgres On-Premises Operational Playbook

This playbook is an operational reference for teams running the Bindplane-managed PostgreSQL store on their own infrastructure. It covers routine maintenance tasks, common problem patterns, and recommended configuration values for a Bindplane workload.

Before following this guide, make sure Postgres is installed and Bindplane is configured to connect to it. See Postgres Store for initial setup.

Supported Versions

Bindplane is tested and supported against the following PostgreSQL major versions:

Version
Support Status

18

Supported

17

Supported

16

Supported

15

Supported

14

Supported

Default Settings Relevant to Bindplane

The following postgresql.conf parameters have the most direct impact on Bindplane's behavior. Review and adjust these before putting a Bindplane instance into production.

max_connections

Controls how many concurrent client connections Postgres accepts. Bindplane holds a connection pool against Postgres; the pool size is set by maxConnections in the Bindplane configuration (default 100).

max_connections = 110

Recommendation: Since this Postgres instance is dedicated to Bindplane, set max_connections to cover all Bindplane nodes plus a small number of connections for admin work (psql sessions, monitoring queries, etc.). For a single-node Bindplane installation with maxConnections = 100, a value around 110 leaves room for admin tasks. For a High Availability deployment, add the maxConnections values across all nodes — for example, 3 nodes × 100 connections = 300, so set max_connections to 310 or similar.

Each Postgres connection consumes approximately 5–10 MB of RAM. Set max_connections only as high as needed.

Applying the Settings

After editing postgresql.conf, changes to max_connections require a full restart:

Dead Tuples and VACUUM

What Are Dead Tuples?

PostgreSQL uses multi-version concurrency control (MVCC). When a row is updated or deleted, the old version of the row is not immediately removed from disk — it becomes a "dead tuple." Over time, dead tuples accumulate and consume disk space. If they are not cleaned up, they also degrade query performance because the query planner must scan over them.

Bindplane performs frequent updates and deletes against the Postgres store — particularly for agent state, rollout tracking, and event records. These workloads generate dead tuples at a higher rate than typical OLTP applications, making routine VACUUM maintenance important.

Checking Dead Tuple Accumulation

Query pg_stat_user_tables to see how many dead tuples exist per table in the bindplane database:

Tables with a high n_dead_tup count relative to n_live_tup — for example, a dead-to-live ratio above 10–20% — are candidates for manual VACUUM ANALYZE.

Running VACUUM ANALYZE Manually

Connect to the bindplane database and run:

To target a specific table — for example the agents table — run:

VACUUM ANALYZE is safe to run on a live database. It does not lock tables in a way that blocks normal operations, but it does consume I/O. Schedule it during a low-traffic window if the Postgres host is resource-constrained.

Configuring Autovacuum

PostgreSQL's autovacuum daemon runs VACUUM automatically in the background. For a Bindplane workload, the default autovacuum thresholds may be too conservative, allowing dead tuples to build up between runs.

The relevant settings are in postgresql.conf. The defaults and recommended values for a Bindplane workload are shown below:

Parameter
Default
Recommended for Bindplane

autovacuum

on

on

autovacuum_vacuum_scale_factor

0.2 (20% of the table)

0.05 (5%)

autovacuum_analyze_scale_factor

0.1 (10% of the table)

0.02 (2%)

autovacuum_vacuum_cost_delay

2ms

2ms

autovacuum_max_workers

3

3–5 depending on CPU count

Edit postgresql.conf (location varies by distribution — commonly /etc/postgresql/<version>/main/postgresql.conf) and adjust:

Reload Postgres to apply without a full restart:

Verify the settings took effect:

Locking Issues

Detecting Blocking Queries

Long-running queries or transactions that hold locks can block other Postgres operations, which surfaces in Bindplane as slow API responses, stalled rollouts, or agent check-in failures.

The following query joins pg_locks with pg_stat_activity to identify any process that is blocking another:

If the result set is non-empty, a lock contention situation exists. The blocking_query column shows what the blocking session is running, and blocking_duration shows how long it has been running.

Terminating a Blocking Process

Once you have identified the blocking PID, you can ask it to stop gracefully:

pg_cancel_backend sends an interrupt to the query, allowing the transaction to roll back cleanly. If the query does not respond to the cancel signal within a reasonable time, terminate the session:

Health Check Queries

Run these queries against the bindplane database to verify Postgres is healthy.

1. Confirm Bindplane Can Connect

From the Bindplane server host, verify connectivity with psql:

A result of 1 confirms a successful connection.

2. Check Database Size

3. Check Active Connection Count

If active_connections is close to max_connections, increase max_connections in postgresql.conf or reduce Bindplane's maxConnections setting.

4. Check for Long-Running Queries

Any query running for more than 30 seconds on a Bindplane database is abnormal and warrants investigation.

5. Check Table Bloat

A dead_pct above 20% indicates that VACUUM has not run recently enough. Review the autovacuum configuration described in the Dead Tuples and VACUUM section above.

6. Check Replication Lag (HA deployments only)

If running Bindplane in High Availability mode with a Postgres replica, check replication lag from the primary:

A replication_lag_bytes value consistently above a few megabytes indicates the replica is falling behind and should be investigated.

Last updated

Was this helpful?