Autoscaling Components
This guide explains how to configure and manage autoscaling for Obsrv components
Overview
Section titled “Overview”Obsrv components can be scaled using KEDA (Kubernetes Event-driven Autoscaling). The default configuration supports autoscaling for Flink Task Managers and Druid components based on various metrics like CPU usage, query latency, and Kafka lag.
Understanding Scaling Scenarios
Section titled “Understanding Scaling Scenarios”Flink Autoscaling
Section titled “Flink Autoscaling”Flink Task Managers need to scale when:
- Kafka consumer lag grows beyond acceptable thresholds
- Processing backpressure increases
- Event processing latency rises
Key considerations:
- Scale up quickly to handle sudden data bursts
- Match scaling with Kafka partition count for optimal parallelism
- Scale down conservatively to avoid job restarts
- Consider checkpoint completion times during scaling
Druid Autoscaling
Section titled “Druid Autoscaling”Druid components scale for different reasons:
Historical Nodes
Section titled “Historical Nodes”- Query response times exceed thresholds
- Segment load times increase
- Available capacity for new segments decreases
- High CPU utilization impacts query performance
Broker Nodes
Section titled “Broker Nodes”- High query queuing or wait times
- Increased number of concurrent queries
- CPU utilization affects query routing efficiency
Key considerations
Section titled “Key considerations”- Historical scaling impacts data availability
- Broker scaling affects query routing and caching
- Both require careful monitoring of query patterns
- Consider time of day and workload patterns
Enabling Autoscaling
Section titled “Enabling Autoscaling”- Enable autoscaling in
autoscaling.yaml:
global: enable_autoscaling: &autoscaling_enabled true- Install the autoscaling rules:
# From enterprise-automation/helmcharts directorycd helmcharts/kitchenbash enterprise.sh autoscalingConfiguration Components
Section titled “Configuration Components”Common Parameters
Section titled “Common Parameters”enabled: Enable/disable autoscaling for the componentkind: Kubernetes resource type (Deployment/StatefulSet)minReplicaCount: Minimum number of replicasmaxReplicaCount: Maximum number of replicaspollingInterval: How often to check metrics (in seconds)cooldownPeriod: Minimum time between scaling operations (in seconds)
HorizontalPodAutoscalerConfig
Section titled “HorizontalPodAutoscalerConfig”The horizontalPodAutoscalerConfig section controls scaling behavior:
horizontalPodAutoscalerConfig: behavior: scaleUp: stabilizationWindowSeconds: 180 # Time window to evaluate metrics before scaling up policies: - type: Pods/Percent # Scale by number of pods or percentage value: 1 # Amount to scale by periodSeconds: 180 # How often to apply the scaling scaleDown: stabilizationWindowSeconds: 300 # Time window to evaluate metrics before scaling down policies: - type: Pods/Percent value: 1 periodSeconds: 300Key Timing Parameters
Section titled “Key Timing Parameters”stabilizationWindowSeconds:- Duration the metrics should be in scaling range before scaling occurs
- Longer windows prevent oscillation but reduce responsiveness
- Typically longer for scale-down than scale-up
periodSeconds:- Minimum time between scaling operations for a specific policy
- Should be >=
stabilizationWindowSeconds - Longer periods provide more stability
cooldownPeriod:- Global cooldown between any scaling operations
- Prevents rapid scaling changes
- Should be longer than
periodSeconds
Example Configurations
Section titled “Example Configurations”1. Flink Task Manager (Unified Pipeline)
Section titled “1. Flink Task Manager (Unified Pipeline)”unified-pipeline-taskmanager: enabled: true kind: Deployment minReplicaCount: 1 maxReplicaCount: "{{.Values.global.kafka.numPartitions}}" pollingInterval: 15 # Check every 15s cooldownPeriod: 900 # 15 min between operations advanced: horizontalPodAutoscalerConfig: behavior: scaleUp: stabilizationWindowSeconds: 30 policies: - type: Percent value: 100 # Double pods periodSeconds: 30 scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 50 # Halve pods periodSeconds: 120Explanation:
- Uses percentage-based scaling for exponential growth
- Quick scale-up (30s) for responsive lag handling
- Conservative scale-down (5m) to prevent oscillation
- Matches Kafka partitions for maximum parallelism
2. Druid Historicals
Section titled “2. Druid Historicals”druid-raw-historicals: enabled: true kind: StatefulSet minReplicaCount: 1 maxReplicaCount: 4 pollingInterval: 60 # Check every 1m cooldownPeriod: 900 # 15 min between operations advanced: horizontalPodAutoscalerConfig: behavior: scaleUp: stabilizationWindowSeconds: 180 # 3m confirmation policies: - type: Pods value: 1 # Add 1 pod periodSeconds: 180 scaleDown: stabilizationWindowSeconds: 7200 # 2h retention policies: - type: Pods value: 1 # Remove 1 pod periodSeconds: 7200Explanation:
- Conservative scaling due to stateful nature
- Long pod retention (2h) for stability
- Pod-based scaling for precise control
- Considers segment loading time
3. Druid Brokers
Section titled “3. Druid Brokers”druid-raw-brokers: enabled: true kind: Deployment minReplicaCount: 1 maxReplicaCount: 2 pollingInterval: 30 # Check every 30s cooldownPeriod: 600 # 10 min between operations advanced: horizontalPodAutoscalerConfig: behavior: scaleUp: stabilizationWindowSeconds: 600 # 10m confirmation policies: - type: Pods value: 1 # Add 1 pod periodSeconds: 600 scaleDown: stabilizationWindowSeconds: 3600 # 1h retention policies: - type: Pods value: 1 # Remove 1 pod periodSeconds: 3600Explanation:
- Moderate scaling speed (faster than historicals, slower than Flink)
- 1-hour pod retention for query stability
- Pod-based scaling for controlled growth
- Balances query routing and cache warmup needs
KEDA Triggers
Section titled “KEDA Triggers”KEDA supports various trigger types. In our configuration, we use:
- Prometheus Triggers:
triggers:- type: prometheus metadata: serverAddress: http://prometheus-operated... metricName: druid_historical_scaling_condition query: | # Scaling query threshold: "90"- Cron Triggers (for night-time scaling):
- type: cron metadata: timezone: UTC start: "30 19 * * *" # 19:30 UTC end: "30 3 * * *" # 03:30 UTC desiredReplicas: "1"For more trigger types and configurations, refer to KEDA Documentation.
Important Notes
Section titled “Important Notes”Druid Component Cleanup
Section titled “Druid Component Cleanup”Historical Nodes
Section titled “Historical Nodes”When scaling down Druid historicals, manual cleanup may be required:
- Delete PVCs for unused replicas:
kubectl delete pvc historical-volume-druid-raw-historicals-{replica} -n <namespace>- Delete corresponding PVs:
kubectl delete pv <pv-name>- Delete cloud provider disks:
- AWS: Delete EBS volumes
- Azure: Delete Azure Disks
- GCP: Delete Persistent Disks
Broker Nodes
Section titled “Broker Nodes”When scaling down Druid brokers:
- Ensure query drain:
- Monitor active queries on the broker
- Wait for existing queries to complete
- Verify no new queries are being routed
- Cache considerations:
- Be aware that scaling down brokers will lose their query cache
- New brokers will need time to warm up their cache
- Consider gradual scale-down during off-peak hours
Infrastructure Requirements
Section titled “Infrastructure Requirements”Node and IP Requirements
Section titled “Node and IP Requirements”Before enabling autoscaling, ensure your cluster has:
- Sufficient Nodes:
- Available nodes with required resources (CPU/Memory)
- Node autoscaling enabled if using cloud providers
- Appropriate node labels/taints if using node affinity
- IP Address Availability:
- Enough free IPs in the subnet
- Consider IP per pod requirements
- Reserve IPs for maximum scale scenario
- Resource Quotas:
- Namespace resource quotas allow for max pods
- Cluster-wide limits accommodate scaling
- Storage class has sufficient quota (for Historicals)
Troubleshooting
Section titled “Troubleshooting”Checking Scaling Status
Section titled “Checking Scaling Status”- View KEDA ScaledObject status:
kubectl get scaledobject -n <namespace>kubectl describe scaledobject <scaledobject-name> -n <namespace>- Check HPA status:
kubectl get hpa -n <namespace>kubectl describe hpa <hpa-name> -n <namespace>- Monitor Kubernetes Events:
# Get events for the scaled resourcekubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>
# Watch events in real-timekubectl get events -n <namespace> --watchCommon Scaling Issues
Section titled “Common Scaling Issues”-
Scaling Not Triggered:
-
Verify KEDA metrics:
Terminal window kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace>/druid_historical_scaling_condition' | jq -
Check Prometheus query results directly
-
Verify trigger thresholds are appropriate
-
-
Scaling Fails:
-
Check for resource constraints:
Terminal window kubectl describe nodes | grep -A 5 "Allocated resources" -
Verify IP availability:
Terminal window kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}' -
Look for PVC/Storage issues (for Historicals):
Terminal window kubectl get pvc -n <namespace>kubectl describe pvc <pvc-name> -n <namespace>
-
-
Pods Stuck in Pending:
-
Check node resources:
Terminal window kubectl describe pod <pod-name> -n <namespace> -
Verify node affinity rules
-
Check for PVC binding issues
-
-
Unexpected Scaling Behavior:
-
Review KEDA logs:
Terminal window kubectl logs -n keda -l app=keda-operator -
Check stabilization windows and cooldown periods
-
Verify metric values over time in Prometheus
-
Using Events for Monitoring
Section titled “Using Events for Monitoring”- Set up event monitoring:
# Watch scaling eventskubectl get events -n <namespace> --field-selector reason=SuccessfulRescale,type=Normal-
Important event types to monitor:
SuccessfulRescale: Successful scaling operationsFailedRescale: Failed scaling attemptsFailedGetMetrics: Metric collection issuesFailedComputeMetricsReplicas: Scaling computation issues
-
Event patterns to watch for:
- Repeated scaling attempts
- Resource constraint messages
- Metric collection failures
- PVC/PV binding issues
Best Practices
Section titled “Best Practices”- Start Conservative:
- Begin with longer stabilization windows
- Use pod-based scaling for precise control
- Gradually reduce timing as you understand patterns
- Monitor Metrics:
- Watch for scaling oscillations
- Monitor resource usage patterns
- Track query performance impact
- Resource Planning:
- Ensure cluster has capacity for max replicas
- Consider node affinity rules
- Plan for storage requirements
- Testing:
- Test scaling behavior in non-production first
- Verify cleanup procedures
- Monitor data consistency during scaling