# Deduplicate Logs

### Description

The Deduplicate Logs processor can be used to deduplicate logs over a time range and emit a single log with the count of duplicate logs.

Logs are considered duplicates if the following match:

* Severity
* Log Body
* Resource Attributes
* Log Attributes

### Supported Types

| Metrics | Logs | Traces |
| ------- | ---- | ------ |
|         | ✓    |        |

### Configuration Table

<table><thead><tr><th width="195.63671875">Parameter</th><th width="105.2890625">Type</th><th width="122.53125">Default</th><th>Description</th></tr></thead><tbody><tr><td>interval*</td><td><code>int</code></td><td><code>10</code></td><td>The interval in seconds on which to aggregate logs. An aggregated log will be emitted after the interval passes.</td></tr><tr><td>log_count_attribute*</td><td><code>string</code></td><td><code>log_count</code></td><td>The name of the count attribute of deduplicated logs that will be added to the emitted log.</td></tr><tr><td>timezone*</td><td><code>string</code></td><td><code>UTC</code></td><td>The timezone of the <code>first_observed_timestamp</code> and <code>last_observed_timestamp</code> log attributes that are on the emitted log.</td></tr><tr><td>include_fields</td><td><code>strings</code></td><td></td><td>A list of fields to include in duplicate matching. Fields can be from the log <code>body</code> or <code>attributes</code>. This option is mutually exclusive with exclude_fields. More details can be found <a href="#includefields-parameter">here</a>.</td></tr><tr><td>exclude_fields</td><td><code>strings</code></td><td></td><td>A list of fields to exclude from duplicate matching. Fields can be excluded from the log <code>body</code> or <code>attributes</code>. These fields will not be present in the emitted log. More details can be found <a href="#exclude_fields-parameter">here</a>.</td></tr></tbody></table>

<mark style="color:red;">\*</mark>*<mark style="color:red;">required field</mark>*

#### `include_fields` Parameter <a href="#includefields-parameter" id="includefields-parameter"></a>

The `include_fields` parameter allows the user to remove fields from being considered when looking for duplicate logs. Fields can be included from either the `body` or `attributes` of a log. Though the entire `body` cannot be included. Nested fields can be specified by delimiting each part of the path with a `.`. If a field contains a `.` as part of its name it can be escaped by using `\.`.

Below are a few examples and how to specify them:

* Include `timestamp` field from the body -> `body.timestamp`
* Include a `log.file.name` field from the log attributes -> `attributes.log\.file\.name`
* Include a nested `ip` field inside a `src` attribute -> `attributes.src.ip`

#### `exclude_fields` Parameter

The `exclude_fields` parameter allows the user to remove fields from being considered when looking for duplicate logs. Fields can be excluded from either the `body` or `attributes` of a log. Though the entire `body` cannot be excluded. Nested fields can be specified by delimiting each part of the path with a `.`. If a field contains a `.` as part of its name it can be escaped by using `\.`.

Below are a few examples and how to specify them:

* Exclude `timestamp` field from the body -> `body.timestamp`
* Include a `log.file.name` field from the log attributes -> `attributes.log\.file\.name`
* Exclude a nested `ip` field inside a `src` attribute -> `attributes.src.ip`

### Example Configuration

#### Basic Configuration

Setting a custom `log_count_attribute` and `timezone` while deduplicating logs on a 60 second `interval`.

**Web Interface**

<figure><img src="https://1405008107-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FgmiOMzBfoNFwmKJFHMcJ%2Fuploads%2Fgit-blob-f582701ec7f577196f45295cdde5c7f45a941319%2Fintegrations-sources-kubernetes-container-logs-image-1.png?alt=media" alt="Bindplane docs - Deduplicate Logs - image 1"><figcaption></figcaption></figure>

**Standalone Processor**

```yaml
apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: log-dedup
  name: log-dedup
spec:
  type: log_dedup
  parameters:
    - name: interval
      value: 60
    - name: log_count_attribute
      value: 'dedup_count'
    - name: timezone
      value: 'America/Los_Angeles'
```

#### Exclude Fields

This example shows the addition of `exclude_fields`. More information on exclude\_fields can be found [here](#exclude_fields-parameter).

**Web Interface**

<figure><img src="https://1405008107-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FgmiOMzBfoNFwmKJFHMcJ%2Fuploads%2Fgit-blob-90e06431cce23aa44ae0c3cdb71010536b74782d%2Fintegrations-sources-kubernetes-container-logs-image-2.png?alt=media" alt="Bindplane docs - Deduplicate Logs - image 2"><figcaption></figcaption></figure>

**Standalone Processor**

```yaml
apiVersion: bindplane.observiq.com/v1
kind: Processor
metadata:
  id: exclude-fields
  name: exclude-fields
spec:
  type: log_dedup
  parameters:
    - name: interval
      value: 10
    - name: log_count_attribute
      value: 'log_count'
    - name: timezone
      value: 'UTC'
    - name: exclude_fields
      value:
        - 'attributes.timestamp'
        - 'body.time'
        - 'attributes.log\.file\.name'
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bindplane.com/integrations/processors/deduplicate-logs.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
