# Postgres On-Premises Operational Playbook

This playbook is an operational reference for teams running the Bindplane-managed PostgreSQL store on their own infrastructure. It covers routine maintenance tasks, common problem patterns, and recommended configuration values for a Bindplane workload.

Before following this guide, make sure Postgres is installed and Bindplane is configured to connect to it. See [Postgres Store](/how-to-guides/infrastructure-and-operations/postgres/postgres-store.md) for initial setup.

### Supported Versions

Bindplane is tested and supported against the following PostgreSQL major versions:

| Version | Support Status |
| ------- | -------------- |
| 18      | Supported      |
| 17      | Supported      |
| 16      | Supported      |
| 15      | Supported      |
| 14      | Supported      |

### Default Settings Relevant to Bindplane

The following `postgresql.conf` parameters have the most direct impact on Bindplane's behavior. Review and adjust these before putting a Bindplane instance into production.

#### `max_connections`

Controls how many concurrent client connections Postgres accepts. Bindplane holds a connection pool against Postgres; the pool size is set by `maxConnections` in the Bindplane configuration (default `100`).

```ini
max_connections = 110
```

**Recommendation:** Since this Postgres instance is dedicated to Bindplane, set `max_connections` to cover all Bindplane nodes plus a small number of connections for admin work (`psql` sessions, monitoring queries, etc.). For a single-node Bindplane installation with `maxConnections = 100`, a value around `110` leaves room for admin tasks. For a High Availability deployment, add the `maxConnections` values across all nodes — for example, 3 nodes × 100 connections = 300, so set `max_connections` to `310` or similar.

{% hint style="info" %}
Each Postgres connection consumes approximately 5–10 MB of RAM. Set `max_connections` only as high as needed.
{% endhint %}

#### Applying the Settings

After editing `postgresql.conf`, changes to `max_connections` require a full restart:

```bash
sudo systemctl restart postgresql
```

### Dead Tuples and VACUUM

#### What Are Dead Tuples?

PostgreSQL uses multi-version concurrency control (MVCC). When a row is updated or deleted, the old version of the row is not immediately removed from disk — it becomes a "dead tuple." Over time, dead tuples accumulate and consume disk space. If they are not cleaned up, they also degrade query performance because the query planner must scan over them.

Bindplane performs frequent updates and deletes against the Postgres store — particularly for agent state, rollout tracking, and event records. These workloads generate dead tuples at a higher rate than typical OLTP applications, making routine VACUUM maintenance important.

#### Checking Dead Tuple Accumulation

Query `pg_stat_user_tables` to see how many dead tuples exist per table in the `bindplane` database:

```sql
SELECT
    relname        AS table_name,
    n_live_tup     AS live_tuples,
    n_dead_tup     AS dead_tuples,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
```

Tables with a high `n_dead_tup` count relative to `n_live_tup` — for example, a dead-to-live ratio above 10–20% — are candidates for manual `VACUUM ANALYZE`.

#### Running VACUUM ANALYZE Manually

Connect to the `bindplane` database and run:

```sql
VACUUM ANALYZE;
```

To target a specific table — for example the `agents` table — run:

```sql
VACUUM ANALYZE agents;
```

{% hint style="info" %}
`VACUUM ANALYZE` is safe to run on a live database. It does not lock tables in a way that blocks normal operations, but it does consume I/O. Schedule it during a low-traffic window if the Postgres host is resource-constrained.
{% endhint %}

#### Configuring Autovacuum

PostgreSQL's `autovacuum` daemon runs VACUUM automatically in the background. For a Bindplane workload, the default autovacuum thresholds may be too conservative, allowing dead tuples to build up between runs.

The relevant settings are in `postgresql.conf`. The defaults and recommended values for a Bindplane workload are shown below:

| Parameter                         | Default                  | Recommended for Bindplane    |
| --------------------------------- | ------------------------ | ---------------------------- |
| `autovacuum`                      | `on`                     | `on`                         |
| `autovacuum_vacuum_scale_factor`  | `0.2` (20% of the table) | `0.05` (5%)                  |
| `autovacuum_analyze_scale_factor` | `0.1` (10% of the table) | `0.02` (2%)                  |
| `autovacuum_vacuum_cost_delay`    | `2ms`                    | `2ms`                        |
| `autovacuum_max_workers`          | `3`                      | `3–5` depending on CPU count |

Edit `postgresql.conf` (location varies by distribution — commonly `/etc/postgresql/<version>/main/postgresql.conf`) and adjust:

```ini
autovacuum = on
autovacuum_vacuum_scale_factor = 0.05
autovacuum_analyze_scale_factor = 0.02
```

Reload Postgres to apply without a full restart:

```bash
sudo systemctl reload postgresql
```

Verify the settings took effect:

```sql
SHOW autovacuum_vacuum_scale_factor;
SHOW autovacuum_analyze_scale_factor;
```

### Locking Issues

#### Detecting Blocking Queries

Long-running queries or transactions that hold locks can block other Postgres operations, which surfaces in Bindplane as slow API responses, stalled rollouts, or agent check-in failures.

The following query joins `pg_locks` with `pg_stat_activity` to identify any process that is blocking another:

```sql
SELECT
    blocking.pid            AS blocking_pid,
    blocking.query          AS blocking_query,
    blocking.state          AS blocking_state,
    blocking.wait_event     AS blocking_wait_event,
    now() - blocking.query_start AS blocking_duration,
    blocked.pid             AS blocked_pid,
    blocked.query           AS blocked_query
FROM pg_locks AS lock_blocked
JOIN pg_stat_activity AS blocked
    ON blocked.pid = lock_blocked.pid
JOIN pg_locks AS lock_blocking
    ON lock_blocking.transactionid = lock_blocked.transactionid
    AND lock_blocking.pid != lock_blocked.pid
JOIN pg_stat_activity AS blocking
    ON blocking.pid = lock_blocking.pid
WHERE NOT lock_blocked.granted
ORDER BY blocking_duration DESC;
```

If the result set is non-empty, a lock contention situation exists. The `blocking_query` column shows what the blocking session is running, and `blocking_duration` shows how long it has been running.

#### Terminating a Blocking Process

Once you have identified the blocking PID, you can ask it to stop gracefully:

```sql
SELECT pg_cancel_backend(<blocking_pid>);
```

`pg_cancel_backend` sends an interrupt to the query, allowing the transaction to roll back cleanly. If the query does not respond to the cancel signal within a reasonable time, terminate the session:

```sql
SELECT pg_terminate_backend(<blocking_pid>);
```

{% hint style="danger" %}
`pg_terminate_backend` immediately kills the connection. Any open transaction on that session will be rolled back. Use this only when `pg_cancel_backend` is insufficient.
{% endhint %}

### Health Check Queries

Run these queries against the `bindplane` database to verify Postgres is healthy.

#### 1. Confirm Bindplane Can Connect

From the Bindplane server host, verify connectivity with `psql`:

```bash
psql \
  --host=bindplane-postgres \
  --port=5432 \
  --username=bindplane \
  --dbname=bindplane \
  --command="SELECT 1;"
```

A result of `1` confirms a successful connection.

#### 2. Check Database Size

```sql
SELECT
    pg_database.datname                                         AS database,
    pg_size_pretty(pg_database_size(pg_database.datname))       AS size
FROM pg_database
WHERE datname = 'bindplane';
```

#### 3. Check Active Connection Count

```sql
SELECT
    count(*)                AS active_connections,
    max_conn.setting::int   AS max_connections
FROM pg_stat_activity
CROSS JOIN (
    SELECT setting FROM pg_settings WHERE name = 'max_connections'
) AS max_conn
WHERE state = 'active'
GROUP BY max_conn.setting;
```

If `active_connections` is close to `max_connections`, increase `max_connections` in `postgresql.conf` or reduce Bindplane's `maxConnections` setting.

#### 4. Check for Long-Running Queries

```sql
SELECT
    pid,
    now() - query_start     AS duration,
    state,
    left(query, 100)        AS query_snippet
FROM pg_stat_activity
WHERE state != 'idle'
  AND query_start IS NOT NULL
  AND now() - query_start > interval '30 seconds'
ORDER BY duration DESC;
```

Any query running for more than 30 seconds on a Bindplane database is abnormal and warrants investigation.

#### 5. Check Table Bloat

```sql
SELECT
    relname                                                 AS table_name,
    n_dead_tup                                             AS dead_tuples,
    n_live_tup                                             AS live_tuples,
    round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_live_tup > 0
ORDER BY dead_pct DESC;
```

A `dead_pct` above 20% indicates that VACUUM has not run recently enough. Review the autovacuum configuration described in the [Dead Tuples and VACUUM](#dead-tuples-and-vacuum) section above.

#### 6. Check Replication Lag (HA deployments only)

If running Bindplane in High Availability mode with a Postgres replica, check replication lag from the primary:

```sql
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    (sent_lsn - replay_lsn) AS replication_lag_bytes
FROM pg_stat_replication;
```

A `replication_lag_bytes` value consistently above a few megabytes indicates the replica is falling behind and should be investigated.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bindplane.com/how-to-guides/infrastructure-and-operations/postgres/postgres-on-prem-playbook.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
