Batching Configuration Performance Impact

In high-throughput ingestion pipelines, the placement of the batch processor in the OpenTelemetry Collector pipeline significantly affects throughput, CPU utilization, and memory consumption.

Poor batching placement can lead to:

  • Excessive CPU load from per-record processing

  • Higher memory pressure from holding unbatched data longer than necessary

  • Reduced throughput due to inefficient export batching

Optimal batching strategy is workload-specific. Without careful tuning, performance can drop by thousands of telemetry items per second.

The batch processor groups telemetry signals into batches to reduce per-export overhead.

Two main placement strategies

  1. Early Batching – Batch immediately after the receiver:

Receiver → Batch Processor → Other Processors → Exporter
  • Pros:

    • Downstream processors work with fewer, larger payloads

    • Reduced CPU context switching downstream

  • Cons:

    • Batch-wide transformations can be more expensive if data is later dropped

    • Potential for higher memory usage if downstream processors expand batches

  1. Late Batching – Batch just before export:

Receiver → Other Processors → Batch Processor → Exporter
  • Pros:

    • Processors handle smaller, more granular payloads

    • Reduces wasted processing on items that may be filtered out later

  • Cons:

    • Less benefit from batching in earlier stages; higher per-record overhead in processing

Observed Performance Differences

From benchmarking across varied workloads:

  • Early batching tends to improve throughput by ~10% in simple pipelines and reduce CPU usage for high-volume, low-complexity telemetry.

  • Late batching performs better in complex pipelines with multiple processing stages, avoiding expensive batch-wide operations on data that will be dropped or heavily modified.

  • Batch size and timeout settings (send_batch_max_size, timeout) directly influence both throughput and resource usage.

Solution

Choose batching placement based on processing complexity and data volume:

Workload Type

Recommended Placement

Reason

High-volume, low-complexity (minimal processing)

Early Batching

Minimizes per-record overhead early in the pipeline

Complex, multi-processor pipelines

Late Batching

Avoids unnecessary batch-wide transformations

Early Batching Example:

service:
  pipelines:
    logs:
      receivers: [my_receiver]
      processors: [batch, memory_limiter, attributes]
      exporters: [otlphttp]

Late Batching Example:

service:
  pipelines:
    logs:
      receivers: [my_receiver]
      processors: [memory_limiter, attributes, batch]
      exporters: [otlphttp]

What's in store for future versions?

The batch processor is being deprecated in favor of exporter-native batching.

While current deployments can benefit from placement tuning, future optimization will focus on exporter-level batch settings, so re-benchmarking will be required after migration.

Last updated

Was this helpful?