# Formatting Cluster Level Events from Kubernetes for Querying in ClickStack

Using the **Process Kubernetes Cluster Events for Clickhouse** blueprint, you can filter routine controller reconciliation, reduce cardinality from object UIDs, map event types to severity levels, and deduplicate warnings, preparing cluster events for efficient Clickhouse storage and alerting.

### Overview

The Kubernetes events blueprint preprocesses cluster-level events from the Kubernetes Events API before they reach ClickHouse. It solves specific challenges in Kubernetes event analytics:

* **Controller Noise Filtering**: Controllers continuously reconcile desired state. Events like `ScalingReplicaSet`, `SuccessfulCreate`, `SuccessfulDelete`, `Scheduled`, and `Pulled` generate high volume with low signal. The blueprint filters these routine Normal events.
* **Cardinality Explosion Prevention**: Kubernetes object UIDs (pod UIDs, deployment UIDs, node UIDs) are globally unique and create massive cardinality in Clickhouse's map columns, degrading query performance. The blueprint removes these identifiers.
* **Severity Mapping**: Kubernetes events carry type (Normal/Warning) but lack OTLP severity levels. The blueprint maps event types and reasons to SeverityText and SeverityNumber, enabling efficient SQL queries like `WHERE SeverityText = 'WARN'`.
* **Warning Deduplication**: Controllers can emit repeated warnings every few seconds during resource pressure or failed rollouts. The blueprint collapses these into single entries with event counts.

### Bindplane Configuration

To deploy this blueprint in your Bindplane instance:

1. Navigate to [Blueprints](https://app.bindplane.com/p/01J06XSD7F4KT3D0XDE2VQDAR5/blueprints) and choose the **Process Kubernetes Cluster Events for ClickHouse** Blueprint. Save it to your Library.
2. Open the **Processors** section in any of your Bindplane configurations.
3. Modify the Blueprint to fit your specific dataset and requirements.
4. Ensure that the data is formatted correctly by comparing the [Snapshot](https://docs.bindplane.com/feature-guides/snapshots) you see to the [Live Preview](https://docs.bindplane.com/feature-guides/live-preview), and validating data pre-processing looks good.
5. Save your changes and rollout the configuration update to production.

#### Resource Detection Setup

The first processor in the pipeline detects resource attributes based on your cloud environment. Update the detector to match your infrastructure:

* **GCP**: Set `detectors: ["gcp"]` (default)
* **AWS EKS**: Set `detectors: ["eks"]`
* **AWS EC2**: Set `detectors: ["ec2"]`
* **Azure AKS**: Set `detectors: ["aks"]`
* **Azure**: Set `detectors: ["azure"]`

Select the detector matching your Kubernetes cluster's hosting environment. This ensures Clickhouse stores accurate resource metadata in the ResourceAttributes map column.

#### Processing Pipeline

The blueprint applies these transformations in sequence:

**Resource Enrichment**: Detects and adds resource attributes (cloud provider, account, region, availability zone) based on your environment. These attributes populate Clickhouse's ResourceAttributes column for multi-cloud deployments.

**Controller Noise Filtering**: Normal events from Kubernetes system components are excluded:

* `ScalingReplicaSet` – ReplicaSet scaling during deployments
* `SuccessfulCreate` / `SuccessfulDelete` – Object creation/deletion
* `Scheduled` – Pod scheduling to nodes
* `Started` / `Created` / `Pulled` – Container lifecycle events
* `SawCompletedJob` – Job completion
* `Killing` – Pod termination
* `SuccessfulRescale` – HPA scaling

These events are routine and add minimal operational value. Warning and Error events are always preserved.

**High-Cardinality UID Removal**: Kubernetes object UIDs are deleted:

* Pod UIDs, ReplicaSet UIDs, Deployment UIDs
* StatefulSet, DaemonSet, Job, CronJob UIDs
* Node UIDs, Namespace UIDs, Service UIDs

Removing UIDs prevents fragmentation in Clickhouse's LogAttributes and ResourceAttributes map columns.

**Metadata Normalization**: Standard fields are set if missing:

* `service.name` defaults to `kubernetes`
* `k8s.cluster.name` defaults to `default` (update to your actual cluster name if desired)
* `k8s.namespace.name` defaults to `default`

These defaults ensure consistent Clickhouse schema across all cluster events.

**Event Type to Severity Mapping**: Kubernetes event types and reasons are mapped to OTLP severity:

* Normal events → `SeverityText="INFO"`, `SeverityNumber=9`
* Warning events → `SeverityText="WARN"`, `SeverityNumber=13`
* Events with reasons containing "error", "failed", "backoff", or "crash" → `SeverityText="ERROR"`, `SeverityNumber=17`

This mapping enables native ClickHouse severity filtering without parsing string fields.

**Warning Deduplication**: Warning events are deduplicated within 60-second windows. Repeated warnings (e.g., "PodCrashLooping" during cascading failures) are collapsed into single entries with an `event_count` field, reducing row count in ClickHouse.

**Batching**: Events are batched for efficient ingestion.

### Customizing the Blueprint

Adjust the blueprint for your deployment:

* **Preserve Routine Events**: Remove or modify the "Filter K8s Controller Noise" processor if you need complete event visibility
* **Adjust Cluster Name**: Update the default `k8s.cluster.name` value to match your actual cluster name for easier filtering in ClickHouse
* **Change Dedup Window**: Modify the warning deduplication `interval` (currently 60 seconds) to tune sensitivity
* **Add Custom Filters**: Insert additional processors to filter other high-volume events specific to your environment

### Querying Deduplicated Events in ClickHouse

When working with deduplicated events, include the `event_count` field:

```sql
SELECT
    attributes['k8s.event.reason'] as reason,
    attributes['involved_object.name'] as object_name,
    event_count,
    COUNT() as unique_events
FROM otel_logs
WHERE SeverityText = 'WARN'
  AND attributes['k8s.cluster.name'] = 'production'
GROUP BY reason, object_name, event_count
ORDER BY unique_events DESC
```

This query shows warning events grouped by reason and object, with `event_count` indicating repetition frequency.

### Integration with ClickHouse

Configure your ClickHouse exporter to align with the blueprint:

* **Batch size**: 5000-10000 rows
* **Timeout**: 5+ seconds for adequate batching
* **Connection pooling**: Enabled for throughput
* **Retry logic**: Enabled for network resilience

The blueprint's filtering and deduplication typically reduce raw event volume by 60-80%, significantly lowering storage costs.

### Monitoring Pipeline Effectiveness

Track these metrics to ensure healthy operation:

* **Filter Effectiveness**: Normal events should represent 40-70% of raw event volume (these are filtered)
* **Dedup Rate**: Warning events with `event_count > 1` indicate successful deduplication
* **Cardinality Reduction**: UID field removal prevents LogAttributes from fragmenting
* **Severity Distribution**: Verify that mapped events have consistent SeverityText/SeverityNumber values

{% hint style="info" %}
**NOTE**

This Blueprint has been tested against standard data patterns. You may need to adjust the configuration to match your specific data format.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.bindplane.com/how-to-guides/partner-integrations/clickhouse/formatting-cluster-level-events-from-kubernetes-for-querying-in-clickstack.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
