Inject trace context on the producer, extract on the consumer; use PRODUCER and CONSUMER span kinds; set semantic conventions (messaging.system, messaging.destination.name, messaging.operation, Kafka partition/offset/consumer group).
They show the raw OpenTelemetry code. It's comprehensive. It's also verbose. Every team ends up re-implementing the same patterns: inject, extract, span kinds, semantic attributes, error handling.
We've all been there: copying "best practice" code from blog posts and adapting it for our broker.
Their key insight:
For batch processing, use a batch span with links or child spans to contributing traces.
Message isolation using a shared queue: propagate tenant ID in Kafka message headers; consumers use tenant ID for selective message consumption.
They make the case that infrastructure duplication is expensive. Instead of separate Kafka clusters per environment, use tenant ID filtering on a shared queue. Instrument producers and consumers for context propagation.
We've all been there: maintaining four "identical" Kafka setups that slowly drift apart.
Their key insight:
Requires modifying consumers and using OpenTelemetry for context propagation.
Request-level isolation is the most cost-effective approach.
They make the case against duplicating infrastructure for testing. Instead of spinning up separate Kafka clusters per tenant, use OpenTelemetry Baggage to propagate tenant ID through async flows. Consumers filter by tenant ID. Istio handles routing.
We've all been there: every team has their own "staging Kafka" and costs balloon.
Their key insight:
Use OpenTelemetry Baggage to propagate tenant ID through sync and async. When publishing to Kafka, producers inject trace context (including baggage) into message headers; consumers extract and make routing decisions.
Traces break at queues unless you extract context from message headers and put it in the appropriate context.
They walk through the real pain: stateful processing loses trace context in caches, Kafka Connect can only do batch-level tracing, and every team ends up writing custom interceptors and state store wrappers.
We've all been there.
Their key insight:
In Kafka Streams and Kafka Connect this often means manual work: interceptors, state stores, batch spans, or extending tracing logic to extract from headers.