> For the complete documentation index, see [llms.txt](https://docs.bindplane.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.bindplane.com/integrations/processors/drain.md).

# Drain

{% hint style="info" %}
This processor is in **Alpha** stability.
{% endhint %}

## Description

Most applications produce the same handful of log messages over and over, with only a few details changing each time, such as a username, an IP address, or a request ID. The `Drain` processor recognizes these repeating shapes and labels each log with the **pattern** it belongs to, so you can group, count, and filter logs by what they *mean* instead of their exact text. For a deep dive into how the underlying algorithm works, see the [Drain algorithm reference](https://logparser.readthedocs.io/en/latest/tools/Drain.html).

For example, these two logs:

```
user alice logged in from 10.0.0.4
user bob logged in from 10.0.0.9
```

both get labeled with the same pattern, where the parts that change are replaced by `<*>`:

```
user <*> logged in from <*>
```

The `Drain` processor adds this pattern as an attribute on each log (`log.record.template` by default). It **only adds the label**. It never drops or changes your logs. To act on the patterns, you can pair this processor with a downstream processor such as [Filter By Condition](/integrations/processors/filter-by-condition.md).

## Supported Types

| Metrics | Logs | Traces |
| ------- | ---- | ------ |
|         | ✓    |        |

Available in the Bindplane Distro for OpenTelemetry Collector `v1.99.0+`.

## Configuration Table

<table><thead><tr><th width="220">Parameter</th><th width="110">Type</th><th width="150">Default</th><th>Description</th></tr></thead><tbody><tr><td>template_attribute<mark style="color:red;">*</mark></td><td><code>string</code></td><td><code>log.record.template</code></td><td>The name of the attribute added to each log to hold its pattern.</td></tr><tr><td>body_field</td><td><code>string</code></td><td></td><td>By default the <code>Drain</code> processor looks at the entire log message. If your logs are structured and the text you care about lives under a single top-level field, name that field here.</td></tr></tbody></table>

<mark style="color:red;">\*</mark>*<mark style="color:red;">required field</mark>*

### **Advanced Configuration**

The defaults work well for most logs. Adjust these only if you need to fine-tune how patterns are formed or how much memory the `Drain` processor uses.

<table><thead><tr><th width="249.3984375">Parameter</th><th width="133">Type</th><th width="133.7890625">Default</th><th>Description</th></tr></thead><tbody><tr><td>merge_threshold</td><td><code>fraction</code></td><td><code>0.4</code></td><td>The minimum similarity a log must have to join an existing pattern, from <code>0.0</code> to <code>1.0</code>. Similarity is the fraction of words that match position for position. The alice/bob logs above share four of six words, a similarity of <code>0.66</code> (see the <a href="https://github.com/logpai/Drain3/blob/9c50c5ff6a5efa2d78039d2a404e1c7d7eb2084c/drain3/drain.py#L391-L413">exact calculation</a>). Higher values group more strictly, producing more, finer patterns. Lower values group more loosely, producing fewer, broader patterns.</td></tr><tr><td>tree_depth</td><td><code>int</code></td><td><code>4</code></td><td>How many words into each log does the processor look when grouping. Higher values create more specific patterns. Minimum: <code>3</code>.</td></tr><tr><td>max_node_children</td><td><code>int</code></td><td><code>100</code></td><td>Limits how many variations the processor tracks at each word position, capping memory when a field can hold many different values.</td></tr><tr><td>max_clusters</td><td><code>int</code></td><td><code>0</code></td><td>The most patterns to keep at once. When the limit is reached, the least recently seen pattern is dropped. <code>0</code> means no limit.</td></tr><tr><td>warmup_min_clusters</td><td><code>int</code></td><td><code>0</code></td><td>Wait until this many patterns have been seen before the processor starts labeling logs. Helps avoid early, unstable patterns. <code>0</code> turns this off.</td></tr><tr><td>extra_delimiters</td><td><code>strings</code></td><td></td><td>By default the log text is split on spaces. Add any other characters to also split on here, such as <code>,</code> or <code>:</code>.</td></tr><tr><td>seed_templates</td><td><code>strings</code></td><td></td><td>Patterns to load at startup so the processor recognizes them right away.</td></tr><tr><td>seed_logs</td><td><code>strings</code></td><td></td><td>Example log lines to train the processor on at startup.</td></tr><tr><td>enable_persistent_storage</td><td><code>bool</code></td><td><code>false</code></td><td>Store the learned patterns in a storage extension so they survive collector restarts. See <a href="#persistent-storage">Persistent Storage</a>.</td></tr><tr><td>enable_save_interval</td><td><code>bool</code></td><td><code>false</code></td><td>Save the learned patterns to persistent storage at the configured interval. If disabled, patterns are saved only when the processor shuts down.</td></tr><tr><td>save_interval</td><td><code>duration</code></td><td><code>60s</code></td><td>The interval at which to save the learned patterns to persistent storage. Only used when <code>enable_save_interval</code> is <code>true</code>.</td></tr><tr><td>persistent_storage_extension</td><td><code>file_storage</code></td><td><code>file_storage</code></td><td>The storage extension used to persist learned patterns. Must be a Storage Extension (<a href="/pages/2skF8zWxhgslQrbPIc83">File Storage</a> or <a href="/pages/MoYLzoRW8nlZhpu6CEfB">Redis Storage</a>).</td></tr></tbody></table>

## Persistent Storage

By default, the `Drain` processor keeps learned patterns in memory only. They are lost whenever the collector restarts, including configuration rollouts and agent upgrades. Enable persistent storage to save the learned patterns to a [File Storage](/configuration/bindplane-otel-collector/extensions/file-storage.md) or [Redis Storage](/configuration/bindplane-otel-collector/extensions/redis-storage.md) extension so they survive restarts.

By default, patterns are saved only when the processor shuts down cleanly. If the collector crashes or is killed, any patterns learned since the last save will be lost. Enable `enable_save_interval` and set `save_interval` to save patterns periodically while the processor is running.

## Best Practices

* **Patterns take a moment to settle.** When the `Drain` processor first starts, it is still learning, so patterns may shift slightly before they stabilize. To get good patterns sooner, pre-load common ones with `seed_templates` / `seed_logs`, and use `warmup_min_clusters` to hold off labeling until enough patterns exist.
* **Patterns are kept in memory only**, so the `Drain` processor starts fresh whenever the collector restarts. This includes configuration rollouts and agent upgrades, since both restart the collector.
* **If you run several collectors, each learns its own patterns**. They will only match up when the collectors see similar logs.
* **Let the `Drain` processor run for a while and review the patterns it produces**, then add the patterns you want to take action on to the `seed_templates`. Seeded patterns are recognized right away, so they stay consistent across restarts, rollouts, and multiple collectors, rather than waiting for each collector to relearn them.
* **Processing on a pattern should be done on a later processor node in your pipeline.** In the `Drain` processor's snapshot, the pattern attribute is only an estimate given the data in the snapshot. The values pull from your collector and therefore are more accurate further downstream, so filtering there avoids acting on patterns that are still settling.

## Example Configurations

#### **Web Interface**

<figure><img src="/files/rHHolo9Z41K31XBVubFP" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/Qiid1sPDYdJO89Re88gK" alt=""><figcaption></figcaption></figure>

### **Basic Configuration**

Label each log with its pattern using the default `log.record.template` attribute.

#### **Standalone Processor**

```yaml
apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: drain
  name: drain
spec:
  type: drain
  parameters:
    - name: template_attribute
      value: log.record.template
```

### **Adjusting How Specific Patterns Are**

Use `tree_depth` and `merge_threshold` to control how finely logs are grouped, and `max_clusters` to cap how many patterns are kept. Higher `tree_depth` and `merge_threshold` create more, narrower patterns. Lower values create fewer, broader ones.

#### **Standalone Processor**

```yaml
apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: drain
  name: drain
spec:
  type: drain
  parameters:
    - name: template_attribute
      value: log.record.template
    - name: tree_depth
      value: 6
    - name: merge_threshold
      value: 0.5
    - name: max_clusters
      value: 500
```

### **Pre-loading Known Patterns**

Add patterns you already expect and hold off labeling until enough patterns are seen, so you get stable results from the start.

#### **Standalone Processor**

```yaml
apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: drain
  name: drain
spec:
  type: drain
  parameters:
    - name: template_attribute
      value: log.record.template
    - name: seed_templates
      value:
        - 'user <*> logged in from <*>'
        - 'heartbeat ping <*>'
    - name: warmup_min_clusters
      value: 20
```

### **Persistent Storage**

Persist the learned patterns to a [File Storage](/configuration/bindplane-otel-collector/extensions/file-storage.md) or [Redis Storage](/configuration/bindplane-otel-collector/extensions/redis-storage.md) extension so they survive collector restarts. Set `enable_save_interval` to also save patterns periodically while the processor is running.

#### **Standalone Processor**

```yaml
apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: drain
  name: drain
spec:
  type: drain
  parameters:
    - name: template_attribute
      value: log.record.template
    - name: enable_persistent_storage
      value: true
    - name: enable_save_interval
      value: true
    - name: save_interval
      value: 30s
    - name: persistent_storage_extension
      value:
        type: file_storage
        parameters:
          - name: directory
            value: /var/lib/otel/storage
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.bindplane.com/integrations/processors/drain.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
