# Storing Security Events in ClickHouse using Bindplane

Using the **Process Security Logs for Clickhouse** blueprint, you can preserve authentication events and security signals while masking PII, normalizing timestamps, and deduplicating repeated alerts, optimizing your SIEM data for Clickhouse analytics.

### Overview

The security logs blueprint preprocesses audit and security logs before ingestion into ClickHouse, addressing SIEM-specific requirements:

* **Security Signal Preservation**: The blueprint protects authentication, authorization, and intrusion detection logs regardless of severity level, ensuring no security events are lost in filtering.
* **PII Masking with Forensic Retention**: Sensitive data like passwords, API keys, and emails are hashed (not censored), allowing forensic analysis while protecting personal information. User IDs and IP addresses are preserved in hashed form for correlation.
* **ECS Normalization**: Security event fields are normalized to the Elastic Common Schema (ECS) standard, enabling consistent querying across diverse security sources (firewalls, identity providers, SIEM platforms, cloud audit logs).
* **Alert Deduplication**: During security incidents, repeated alerts can flood the system. The blueprint deduplicates within 60-second windows while preserving the key forensic fields (event type, source IP, user name, action) needed for investigation.

### Bindplane Configuration

To implement this blueprint in your Bindplane deployment:

1. Navigate to [Blueprints](https://app.bindplane.com/p/01J06XSD7F4KT3D0XDE2VQDAR5/blueprints) and choose the **Process Security Logs for ClickHouse** Blueprint. Save it to your Library.
2. Open the **Processors** section in any of your Bindplane configurations.
3. Modify the Blueprint to fit your specific dataset and requirements.
4. Ensure that the data is formatted correctly by comparing the [Snapshot](https://docs.bindplane.com/feature-guides/snapshots) you see to the [Live Preview](https://docs.bindplane.com/feature-guides/live-preview), and validating data pre-processing looks good.
5. Save your changes and rollout the configuration update to production.

#### Processing Pipeline

The blueprint applies these transformations in sequence:

**JSON Parsing**: Security log bodies are parsed from JSON into structured attributes, enabling field-level access for downstream processors.

**Priority-Based Filtering**: The blueprint filters low-priority informational events while preserving all authentication, authorization, intrusion detection, and malware alerts:

* Drops INFO-level events unless they're authentication-related or marked with `security.keep="true"`
* Always preserves events with `event.category` in: authentication, authorization, intrusion, malware
* Always preserves events with `event.type` in: login, logout, access, denied, failed, blocked

This ensures critical security signals are never lost due to severity filtering.

**PII Masking with Hashing**: Sensitive data is hashed using SHA-3 rather than censored, preserving forensic value:

* Passwords, secrets, and credentials → hashed
* Emails, phone numbers, SSNs, credit cards → hashed
* Authorization tokens and API keys → hashed

Forensically important fields are preserved:

* `user.id` and `user.name` – hashed for correlation
* `source.ip` and `destination.ip` – hashed for traffic pattern analysis
* `event.action` and `event.outcome` – preserved for investigating access attempts

Hashing allows you to link failed authentication attempts from the same user or IP while protecting actual PII.

**Non-Security Cardinality Reduction**: High-cardinality fields unrelated to security analysis are removed:

* Request IDs, correlation IDs, transaction IDs
* Session IDs and container identifiers
* Kubernetes pod UIDs and process IDs
* HTTP cookies

Security-relevant fields (user, IP, event type) are preserved despite potential cardinality.

**ECS Field Normalization**: Standard security event fields receive defaults:

* `event.kind` defaults to `event`
* `event.category` defaults to `authentication`
* `event.outcome` defaults to `unknown`
* `service.name` defaults to `unknown` (update to your service name)
* `severity_text` defaults to `INFO`

ECS normalization ensures consistent field naming across heterogeneous security sources.

**Failed Auth Enrichment**: Failed authentication attempts are automatically enriched:

* Events matching `event.type` = login/authentication with `event.outcome` = failure/denied/failed receive:
  * `security.signal="auth_failure"`
  * `security.priority="high"`

This enrichment simplifies querying for security incidents.

**Alert Deduplication**: Repeated security alerts (events with `security.signal` or categories like intrusion/malware/threat) are deduplicated within 60-second windows. The deduplication preserves key forensic fields:

* `event.category`, `event.type`, `event.action`
* `source.ip`, `user.name`

Duplicate occurrences are recorded in an `alert_count` field, allowing investigation of attack patterns without log flooding.

**Lower-Latency Batching**: Security logs use smaller batches (2000-5000) with 3-second timeout to reduce latency, ensuring rapid alerting over maximum throughput.

### Customizing the Blueprint

Adjust the blueprint for your environment:

* **Update Service Names**: Modify the `service.name` default to match your actual services for easier filtering
* **Adjust Dedup Window**: Change the 60-second deduplication window if your security tools have different alert patterns
* **Extend Security Signals**: Add custom processors to enrich additional event types with security signals (e.g., marking suspicious API calls)
* **Modify Redaction Strategy**: Change from hashing to censoring (asterisks) if your compliance requirements forbid even hashed PII retention
* **Add Custom Filters**: Insert processors to exclude domain-specific noise (e.g., automated health checks from monitoring tools)

### Querying Security Data in Clickhouse

Query deduplicated security alerts while accounting for repetition:

```sql
SELECT
    attributes['event.category'] as category,
    attributes['event.type'] as event_type,
    attributes['source.ip'] as source_ip,
    alert_count,
    COUNT() as occurrences
FROM otel_logs
WHERE SeverityText IN ('WARN', 'ERROR')
  AND attributes['security.signal'] IS NOT NULL
GROUP BY category, event_type, source_ip, alert_count
ORDER BY occurrences DESC
LIMIT 100
```

Find failed authentication attempts:

```sql
SELECT
    attributes['user.name'] as user,
    attributes['source.ip'] as source_ip,
    COUNT() as failures,
    MAX(alert_count) as max_consecutive_failures
FROM otel_logs
WHERE attributes['security.signal'] = 'auth_failure'
  AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY user, source_ip
HAVING failures > 3
ORDER BY failures DESC
```

### Integration with Clickhouse

Configure your Clickhouse exporter for security data:

* **Batch size**: 2000-5000 rows (smaller than general logs for lower latency)
* **Timeout**: 3 seconds (shorter than general logs for rapid alerting)
* **Retry logic**: Enabled with exponential backoff for reliability
* **Connection pooling**: Enabled for steady throughput

The lower batch size and timeout ensure security alerts reach analysts quickly while still maintaining efficient batching.

### Compliance and Audit Trail

The blueprint helps maintain compliance through:

* **PII Protection**: Sensitive data is hashed, complying with GDPR, CCPA, and other data protection regulations
* **Audit Trail**: Preserved forensic fields (`user.name`, `source.ip`, `event.action`, `event.outcome`) enable incident investigation without storing raw PII
* **Field Normalization**: ECS schema compliance simplifies audit log analysis across multi-vendor environments

### Monitoring Pipeline Health

Track these metrics to ensure proper security log processing:

* **Filter Retention Rate**: Security-relevant logs should never be filtered; verify 100% of authentication and malware events pass through
* **Alert Dedup Rate**: Monitor `alert_count` distribution to detect attack patterns and alert fatigue
* **Latency**: Monitor end-to-end latency from ingestion to ClickHouse (target: < 3 seconds for critical alerts)
* **Hash Consistency**: Verify that the same user/IP consistently hashes to the same value for correlation

{% hint style="info" %}
**NOTE**

This Blueprint has been tested against standard data patterns. You may need to adjust the configuration to match your specific data format.
{% endhint %}
