Batching Configuration Performance Impact

In high-throughput ingestion pipelines, the placement of the batch processor in the OpenTelemetry Collector pipeline significantly affects throughput, CPU utilization, and memory consumption.

Poor batching placement can lead to:

Excessive CPU load from per-record processing
Higher memory pressure from holding unbatched data longer than necessary
Reduced throughput due to inefficient export batching

Optimal batching strategy is workload-specific. Without careful tuning, performance can drop by thousands of telemetry items per second.

The batch processor groups telemetry signals into batches to reduce per-export overhead.

Two main placement strategies

Early Batching – Batch immediately after the receiver:

Receiver → Batch Processor → Other Processors → Exporter

Pros:
- Downstream processors work with fewer, larger payloads
- Reduced CPU context switching downstream
Cons:
- Batch-wide transformations can be more expensive if data is later dropped
- Potential for higher memory usage if downstream processors expand batches

Late Batching – Batch just before export:

Receiver → Other Processors → Batch Processor → Exporter

Pros:
- Processors handle smaller, more granular payloads
- Reduces wasted processing on items that may be filtered out later
Cons:
- Less benefit from batching in earlier stages; higher per-record overhead in processing

Observed Performance Differences

From benchmarking across varied workloads:

Early batching tends to improve throughput by ~10% in simple pipelines and reduce CPU usage for high-volume, low-complexity telemetry.
Late batching performs better in complex pipelines with multiple processing stages, avoiding expensive batch-wide operations on data that will be dropped or heavily modified.
Batch size and timeout settings (send_batch_max_size, timeout) directly influence both throughput and resource usage.

Solution

Choose batching placement based on processing complexity and data volume:

Workload Type

Recommended Placement

Reason

High-volume, low-complexity (minimal processing)

Early Batching

Minimizes per-record overhead early in the pipeline

Complex, multi-processor pipelines

Late Batching

Avoids unnecessary batch-wide transformations

Early Batching Example:

service:
  pipelines:
    logs:
      receivers: [my_receiver]
      processors: [batch, memory_limiter, attributes]
      exporters: [otlphttp]

Late Batching Example:

service:
  pipelines:
    logs:
      receivers: [my_receiver]
      processors: [memory_limiter, attributes, batch]
      exporters: [otlphttp]

PreviousUsing an OpenTelemetry Distribution with Bindplane NextModifying log body timestamps

Last updated 2 months ago

Was this helpful?

Good evening

Two main placement strategies

Observed Performance Differences

Solution