Skip to content

Integration with Managed Kafka Services

Integrate Obsrv with managed Kafka services such as AWS MSK, Azure Event Hubs, and Confluent Cloud.

Obsrv supports integration with managed Kafka services provided by various cloud platforms (e.g., AWS MSK, Azure Event Hubs for Kafka, Confluent Cloud, etc.).

This guide outlines the setup process for integrating Obsrv with a managed Kafka service. While the example uses AWS MSK, the same steps apply to other providers by updating Kafka broker endpoints, networking, and security configurations accordingly.


Before starting, ensure you have the following:

  • A Kubernetes cluster (e.g., EKS, AKS, GKE) ready and running.
  • Access to a managed Kafka service in your cloud provider.
  • Basic understanding of your VPC/networking setup (VPC ID, Subnets, Security Groups/Firewall Rules).

To ensure seamless connectivity between Obsrv services and Kafka, follow these guidelines:

1. Kafka and Kubernetes in the Same Network

  • Deploy your Kafka cluster within the same VPC (or equivalent network) and security group/firewall rules as your Kubernetes cluster.
  • Ensure port 9092 (or appropriate port for plaintext/TLS communication) is open between Kafka brokers and Kubernetes nodes/pods.

2. Disable TLS (if applicable)

  • If your managed Kafka service supports it, choose plaintext communication to simplify integration (e.g., no TLS certs or SASL auth).
  • This allows Obsrv to connect to Kafka brokers over port 9092 without additional setup.

Once the Kafka service is deployed:

  • Obtain the broker addresses from your provider’s console or CLI.
  • Example:
broker-1.example.kafka.cloudprovider.com
broker-2.example.kafka.cloudprovider.com

These endpoints will be used in Obsrv’s configuration.


Edit the global-values.yaml or your automation script to point Obsrv to the external Kafka cluster:

kafka40: &kafka40
namespace: *kafka40-namespace
host: "broker-1.example.kafka.cloudprovider.com"
port: 9092
bootstrap-server: &kafka-bootstrap-server "broker-1.example.kafka.cloudprovider.com:9092,broker-2.example.kafka.cloudprovider.com:9092"

This ensures all Obsrv services connect to the managed Kafka cluster.


Most managed Kafka services do not support automatic topic creation. You must provision topics manually using a Kafka client.

Use the following script (adjusted with your broker endpoint) to create required topics:

Terminal window
for topic in connectors.failed \
denorm \
failed \
hudi.connector.in \
ingest \
masterdata.ingest \
masterdata.stats \
obsrv-connectors-metrics \
raw \
spark.stats \
stats \
system.events \
system.telemetry.events \
telemetry \
transform \
unique
do
./kafka-topics.sh \
--create \
--bootstrap-server broker-1.example.kafka.cloudprovider.com:9092 \
--replication-factor 1 \
--partitions 4 \
--topic $topic
done

After updating the configuration and provisioning topics:

  • Deploy or redeploy all Obsrv services.
  • They will now use the managed Kafka cluster.

  • By default, new datasources point to the managed Kafka version, so no manual update is needed after creation.
  • For existing datasources, you can manually update Postgres, use the Datasource Update API to modify the ingestion spec with the managed Kafka URL, or simply edit and republish the datasets — the dataset will then pick up the latest configured Kafka URL.

  • If the dataset contains any “Obsrv GA” versioned rollup data sources, update the rollup druid ingestion spec with the correct MSK URLs. Then, resubmit the updated ingestion spec to Druid.

  • After updating all supervisors to the new MSK URL, reset all the dataset/rollups supervisors to align Druid with the new offsets in MSK.
  • The MSK data is new and unprocessed, so resetting will not result in duplicates or errors.

By default, Obsrv installation may deploy a self-managed Kafka service. This must be removed to avoid conflicts.

Option 1 — Uninstall using Helm:

Terminal window
helm uninstall kafka40 -n obsrv

Option 2 — Edit Installation Script:

  • Locate the section that installs Kafka.
  • Comment out or remove the Kafka installation lines.

Once deployed, validate the integration across all layers:

AreaSanity Validation
1. Data IngestionCreate a sample dataset (JSON/CSV) and ingest into Obsrv via the ingestion API. Ensure the dataset is available for querying.
2. Processing FlowValidate that ingested data flows through Obsrv pipelines (raw → transform → denorm → routing). Confirm messages are visible in the corresponding Kafka topics in MSK.
3. Query ReadinessQuery the processed dataset through Obsrv query service (or connected engines like Druid/Hudi). Ensure ingested data is available for analytics.
4. Data Backup / SecorVerify Secor (or equivalent backup service) is consuming from MSK topics and storing data into the backup store (e.g., S3). Validate files are created and data matches the ingested events.
5. ConnectorsTest a connector (e.g., Kafka, Debezium, Neo4J). Confirm the dataset is written to the target system successfully.
6. Kafka MetricsEnsure Kafka-related metrics (producer/consumer lag, offsets, throughput) are exported and indexed into Prometheus. Check that Grafana dashboards show MSK metrics.
7. System EventsValidate system topics (system.events, system.telemetry.events) are being populated and consumed by Obsrv services without errors.
8. Failure HandlingInject malformed/bad events and confirm they appear in failed or connectors.failed topics. Ensure retry or error-handling logic works.
9. Scalability TestPush a slightly higher load of events and confirm Obsrv services autoscale and continue processing without lag buildup.
10. Cleanup / Conflict CheckConfirm default Kafka (self-managed) is uninstalled/disabled and only MSK brokers are being used.
11. Existing DatasetsThe existing (old) datasets should process the data and should start reading the data from the managed Kafka service.

By following these steps, you can successfully integrate Obsrv with any managed Kafka service. This approach allows you to leverage the scalability and operational benefits of a cloud-managed Kafka while maintaining full compatibility with Obsrv services.