Integration with Managed Kafka Services
Introduction
Obsrv supports integration with managed Kafka services provided by various cloud platforms (e.g., AWS MSK, Azure Event Hubs for Kafka, Confluent Cloud, etc.).
This guide outlines the setup process for integrating Obsrv with a managed Kafka service. While the example uses AWS MSK, the same steps apply to other providers by updating Kafka broker endpoints, networking, and security configurations accordingly.
Note: This documentation is based on Obsrv v2.1.0 and Kafka v4.x.
Prerequisites
Before starting, ensure you have the following:
A Kubernetes cluster (e.g., EKS, AKS, GKE) ready and running.
Access to a managed Kafka service in your cloud provider.
Basic understanding of your VPC/networking setup (VPC ID, Subnets, Security Groups/Firewall Rules).
Networking Setup
To ensure seamless connectivity between Obsrv services and Kafka, follow these guidelines:
1. Kafka and Kubernetes in the Same Network
Deploy your Kafka cluster within the same VPC (or equivalent network) and security group/firewall rules as your Kubernetes cluster.
Ensure port
9092(or appropriate port for plaintext/TLS communication) is open between Kafka brokers and Kubernetes nodes/pods.
2. Disable TLS (if applicable)
If your managed Kafka service supports it, choose plaintext communication to simplify integration (e.g., no TLS certs or SASL auth).
This allows Obsrv to connect to Kafka brokers over port
9092without additional setup.
⚠️ Modify this based on your provider’s security policies. For stricter environments, TLS/SASL configuration will be needed.
Kafka Broker Endpoints
Once the Kafka service is deployed:
Obtain the broker addresses from your provider's console or CLI.
Example:
These endpoints will be used in Obsrv's configuration.
Update Obsrv Configuration
Edit the global-values.yaml or your automation script to point Obsrv to the external Kafka cluster.
This ensures all Obsrv services connect to the managed Kafka cluster.
Create Kafka Topics
Most managed Kafka services do not support automatic topic creation. You must provision topics manually using a Kafka client.
Use the following script (adjusted with your broker endpoint) to create required topics:
✅ Use your preferred Kafka client tool compatible with your managed service.
Deploy Obsrv Services
After updating the configuration and provisioning topics:
Deploy or redeploy all Obsrv services.
They will now use the managed Kafka cluster.
Update the Existing Datasets
By default, new datasources point to the managed Kafka version, so no manual update is needed after creation.
For existing datasources, you can manually update Postgres, use the Datasource Update API to modify the ingestion spec with the managed Kafka URL, or simply edit and republish the datasets, the dataset will then pick up the latest configured Kafka URL.
Remove Default Kafka Installation
By default, Obsrv installation may deploy a self-managed Kafka service. This must be removed to avoid conflicts:
Two Options:
Uninstall using Helm:
Edit Installation Script:
Locate the section that installs Kafka.
Comment out or remove the Kafka installation lines.
Sanity Testing
Once deployed:
Area
Sanity Validation
1. Data Ingestion
Create a sample dataset (JSON/CSV) and ingest into Obsrv via the ingestion API. Ensure the dataset is available for querying.
2. Processing Flow
Validate that ingested data flows through Obsrv pipelines (raw → transform → denorm → routing). Confirm messages are visible in the corresponding Kafka topics in MSK.
3. Query Readiness
Query the processed dataset through Obsrv query service (or connected engines like Druid/Hudi). Ensure ingested data is available for analytics.
4. Data Backup / Secor
Verify Secor (or equivalent backup service) is consuming from MSK topics and storing data into the backup store (e.g., S3). Validate files are created and data matches the ingested events.
5. Connectors
Test a connector (e.g., Kafka, Debezium, Neo4J). Confirm the dataset is written to the target system successfully.
6. Kafka Metrics
Ensure Kafka-related metrics (producer/consumer lag, offsets, throughput) are exported and indexed into Prometheus. Check that Grafana dashboards show MSK metrics.
7. System Events
Validate system topics (system.events, system.telemetry.events) are being populated and consumed by Obsrv services without errors.
8. Failure Handling
Inject malformed/bad events and confirm they appear in failed or connectors.failed topics. Ensure retry or error-handling logic works.
9. Scalability Test
Push a slightly higher load of events and confirm Obsrv services autoscale and continue processing without lag buildup.
10. Cleanup / Conflict Check
Confirm default Kafka (self-managed) is uninstalled/disabled and only MSK brokers are being used.
Existing Datasets
The existing (old) datasets should process the data and should start reading the data from the the managed kafka service
Conclusion
By following these steps, you can successfully integrate Obsrv with any managed Kafka service. This approach allows you to leverage the scalability and operational benefits of a cloud-managed Kafka while maintaining full compatibility with Obsrv services.
🧹 Important: Don’t forget to remove the default Kafka installation using:
This ensures that Obsrv uses only your managed Kafka service and avoids unnecessary resource usage or conflicts.
Last updated
