Autoscaling Components

This guide explains how to configure and manage autoscaling for Obsrv components

Overview

Obsrv components can be scaled using KEDA (Kubernetes Event-driven Autoscaling). The default configuration supports autoscaling for Flink Task Managers and Druid components based on various metrics like CPU usage, query latency, and Kafka lag.

Understanding Scaling Scenarios

Flink Autoscaling

Flink Task Managers need to scale when:

Kafka consumer lag grows beyond acceptable thresholds
Processing backpressure increases
Event processing latency rises

Key considerations:

Scale up quickly to handle sudden data bursts
Match scaling with Kafka partition count for optimal parallelism
Scale down conservatively to avoid job restarts
Consider checkpoint completion times during scaling

Druid Autoscaling

Druid components scale for different reasons:

Historical Nodes

Query response times exceed thresholds
Segment load times increase
Available capacity for new segments decreases
High CPU utilization impacts query performance

Broker Nodes

High query queuing or wait times
Increased number of concurrent queries
CPU utilization affects query routing efficiency

Key considerations

Historical scaling impacts data availability
Broker scaling affects query routing and caching
Both require careful monitoring of query patterns
Consider time of day and workload patterns

Enabling Autoscaling

Enable autoscaling in autoscaling.yaml:

global:
  enable_autoscaling: &autoscaling_enabled true

Install the autoscaling rules:

# From enterprise-automation/helmcharts directory
cd helmcharts/kitchen
bash enterprise.sh autoscaling

Configuration Components

Common Parameters

enabled: Enable/disable autoscaling for the component
kind: Kubernetes resource type (Deployment/StatefulSet)
minReplicaCount: Minimum number of replicas
maxReplicaCount: Maximum number of replicas
pollingInterval: How often to check metrics (in seconds)
cooldownPeriod: Minimum time between scaling operations (in seconds)

HorizontalPodAutoscalerConfig

The horizontalPodAutoscalerConfig section controls scaling behavior:

horizontalPodAutoscalerConfig:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 180   # Time window to evaluate metrics before scaling up
      policies:
        - type: Pods/Percent            # Scale by number of pods or percentage
          value: 1                      # Amount to scale by
          periodSeconds: 180            # How often to apply the scaling
    scaleDown:
      stabilizationWindowSeconds: 300   # Time window to evaluate metrics before scaling down
      policies:
        - type: Pods/Percent
          value: 1
          periodSeconds: 300

Key Timing Parameters

stabilizationWindowSeconds:
- Duration the metrics should be in scaling range before scaling occurs
- Longer windows prevent oscillation but reduce responsiveness
- Typically longer for scale-down than scale-up
periodSeconds:
- Minimum time between scaling operations for a specific policy
- Should be >= stabilizationWindowSeconds
- Longer periods provide more stability
cooldownPeriod:
- Global cooldown between any scaling operations
- Prevents rapid scaling changes
- Should be longer than periodSeconds

Example Configurations

1. Flink Task Manager (Unified Pipeline)

unified-pipeline-taskmanager:
  enabled: true
  kind: Deployment
  minReplicaCount: 1
  maxReplicaCount: "{{.Values.global.kafka.numPartitions}}"
  pollingInterval: 15        # Check every 15s
  cooldownPeriod: 900        # 15 min between operations
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 30
          policies:
            - type: Percent
              value: 100     # Double pods
              periodSeconds: 30
        scaleDown:
          stabilizationWindowSeconds: 300
          policies:
            - type: Percent
              value: 50      # Halve pods
              periodSeconds: 120

Explanation:

Uses percentage-based scaling for exponential growth
Quick scale-up (30s) for responsive lag handling
Conservative scale-down (5m) to prevent oscillation
Matches Kafka partitions for maximum parallelism

2. Druid Historicals

druid-raw-historicals:
  enabled: true
  kind: StatefulSet
  minReplicaCount: 1
  maxReplicaCount: 4
  pollingInterval: 60                        # Check every 1m
  cooldownPeriod: 900                        # 15 min between operations
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 180    # 3m confirmation
          policies:
            - type: Pods
              value: 1                       # Add 1 pod
              periodSeconds: 180
        scaleDown:
          stabilizationWindowSeconds: 7200   # 2h retention
          policies:
            - type: Pods
              value: 1                       # Remove 1 pod
              periodSeconds: 7200

Explanation:

Conservative scaling due to stateful nature
Long pod retention (2h) for stability
Pod-based scaling for precise control
Considers segment loading time

3. Druid Brokers

druid-raw-brokers:
  enabled: true
  kind: Deployment
  minReplicaCount: 1
  maxReplicaCount: 2
  pollingInterval: 30                        # Check every 30s
  cooldownPeriod: 600                        # 10 min between operations
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        scaleUp:
          stabilizationWindowSeconds: 600   # 10m confirmation
          policies:
            - type: Pods
              value: 1                      # Add 1 pod
              periodSeconds: 600
        scaleDown:
          stabilizationWindowSeconds: 3600  # 1h retention
          policies:
            - type: Pods
              value: 1                      # Remove 1 pod
              periodSeconds: 3600

Explanation:

Moderate scaling speed (faster than historicals, slower than Flink)
1-hour pod retention for query stability
Pod-based scaling for controlled growth
Balances query routing and cache warmup needs

KEDA Triggers

KEDA supports various trigger types. In our configuration, we use:

Prometheus Triggers:

triggers:
- type: prometheus
  metadata:
    serverAddress: http://prometheus-operated...
    metricName: druid_historical_scaling_condition
    query: |
      # Scaling query
    threshold: "90"

Cron Triggers (for night-time scaling):

- type: cron
  metadata:
    timezone: UTC
    start: "30 19 * * *"    # 19:30 UTC
    end: "30 3 * * *"       # 03:30 UTC
    desiredReplicas: "1"

For more trigger types and configurations, refer to KEDA Documentation.

Important Notes

Druid Component Cleanup

Historical Nodes

When scaling down Druid historicals, manual cleanup may be required:

Delete PVCs for unused replicas:

kubectl delete pvc historical-volume-druid-raw-historicals-{replica} -n <namespace>

Delete corresponding PVs:

kubectl delete pv <pv-name>

Delete cloud provider disks:

AWS: Delete EBS volumes
Azure: Delete Azure Disks
GCP: Delete Persistent Disks

Broker Nodes

When scaling down Druid brokers:

Ensure query drain:
- Monitor active queries on the broker
- Wait for existing queries to complete
- Verify no new queries are being routed
Cache considerations:
- Be aware that scaling down brokers will lose their query cache
- New brokers will need time to warm up their cache
- Consider gradual scale-down during off-peak hours

Infrastructure Requirements

Node and IP Requirements

Before enabling autoscaling, ensure your cluster has:

Sufficient Nodes:
- Available nodes with required resources (CPU/Memory)
- Node autoscaling enabled if using cloud providers
- Appropriate node labels/taints if using node affinity
IP Address Availability:
- Enough free IPs in the subnet
- Consider IP per pod requirements
- Reserve IPs for maximum scale scenario
Resource Quotas:
- Namespace resource quotas allow for max pods
- Cluster-wide limits accommodate scaling
- Storage class has sufficient quota (for Historicals)

Troubleshooting

Checking Scaling Status

View KEDA ScaledObject status:

kubectl get scaledobject -n <namespace>
kubectl describe scaledobject <scaledobject-name> -n <namespace>

Check HPA status:

kubectl get hpa -n <namespace>
kubectl describe hpa <hpa-name> -n <namespace>

Monitor Kubernetes Events:

# Get events for the scaled resource
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>

# Watch events in real-time
kubectl get events -n <namespace> --watch

Common Scaling Issues

Scaling Not Triggered:
- Verify KEDA metrics:
  Terminal window
```
kubectl get --raw '/apis/external.metrics.k8s.io/v1beta1/namespaces/<namespace>/druid_historical_scaling_condition' | jq
```
- Check Prometheus query results directly
- Verify trigger thresholds are appropriate

Scaling Fails:

Check for resource constraints:

kubectl describe nodes | grep -A 5 "Allocated resources"

Verify IP availability:

kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'

Look for PVC/Storage issues (for Historicals):

kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>

Pods Stuck in Pending:
- Check node resources:
  Terminal window
```
kubectl describe pod <pod-name> -n <namespace>
```
- Verify node affinity rules
- Check for PVC binding issues
Unexpected Scaling Behavior:
- Review KEDA logs:
  Terminal window
```
kubectl logs -n keda -l app=keda-operator
```
- Check stabilization windows and cooldown periods
- Verify metric values over time in Prometheus

Using Events for Monitoring

Set up event monitoring:

# Watch scaling events
kubectl get events -n <namespace> --field-selector reason=SuccessfulRescale,type=Normal

Important event types to monitor:
- SuccessfulRescale: Successful scaling operations
- FailedRescale: Failed scaling attempts
- FailedGetMetrics: Metric collection issues
- FailedComputeMetricsReplicas: Scaling computation issues
Event patterns to watch for:
- Repeated scaling attempts
- Resource constraint messages
- Metric collection failures
- PVC/PV binding issues

Best Practices

Start Conservative:
- Begin with longer stabilization windows
- Use pod-based scaling for precise control
- Gradually reduce timing as you understand patterns
Monitor Metrics:
- Watch for scaling oscillations
- Monitor resource usage patterns
- Track query performance impact
Resource Planning:
- Ensure cluster has capacity for max replicas
- Consider node affinity rules
- Plan for storage requirements
Testing:
- Test scaling behavior in non-production first
- Verify cleanup procedures
- Monitor data consistency during scaling

Autoscaling Components

Overview

Understanding Scaling Scenarios

Flink Autoscaling

Druid Autoscaling

Historical Nodes

Broker Nodes

Key considerations

Enabling Autoscaling

Configuration Components

Common Parameters

HorizontalPodAutoscalerConfig

Key Timing Parameters

Example Configurations

1. Flink Task Manager (Unified Pipeline)

2. Druid Historicals

3. Druid Brokers

KEDA Triggers

Important Notes

Druid Component Cleanup

Historical Nodes

Broker Nodes

Infrastructure Requirements

Node and IP Requirements

Troubleshooting

Checking Scaling Status

Common Scaling Issues

Using Events for Monitoring

Best Practices

References