Data Backup and Restoration
Instructions to restoration of obsrv from the backups
π Overview
This guide provides comprehensive instructions to restore Obsrv data using Velero, Redis, and PostgreSQL backups. It also includes procedures for Druid data migration between S3 buckets.
π§Ύ Introduction
This document provides a step-by-step guide for restoring data from backups in an Obsrv deployment. It covers:
Restoration of critical services (PostgreSQL, Redis, etc.)
Velero-based system-wide recovery
Data migration between object storage buckets (e.g., for Druid segments)
π Backup Storage Details
Service
S3 Bucket Naming Convention
PostgreSQL
backups-{building_block}-{env}-{account-id}
Denorm Redis
backups-{building_block}-{env}-{account-id}
Dedup Redis
backups-{building_block}-{env}-{account-id}
Dataset Events
{building_block}-{env}-{account-id}
Terraform State
Stored in: AWS_TERRAFORM_BACKEND_BUCKET_NAME
Velero Backups
velero-{building_block}-{env}-{account-id}
Flink Checkpoints
checkpoint-{building_block}-{env}-{account-id}
π Velero: Full Restoration Workflow
β
Prerequisites
Ensure the following tools are installed:
AWS CLI
Velero CLI
To install Velero:
Ensure:
AWS credentials are configured
Kubernetes cluster is accessible
Velero is installed and running in the cluster
π¦ Restoration Procedure
π Restore All Services
π§ Restore a Specific Service (e.g., PostgreSQL)
β Verify Restoration
π§ͺ Post-Restoration Checklist
Ensure all pods are running (
PostgreSQL,Redis,Prometheus, etc.)Verify data in all critical databases
Check data processors (Kafka, Flink, Druid) resume from last committed state
Run test queries to validate dataset access and accuracy
πΎ Redis & PostgreSQL: Targeted Restoration
Use this when restoring only Redis and PostgreSQL without changes to S3 bucket paths.
π Pre-Restoration
Pause these services:
Flink Jobs
Web Console
Obsrv APIs
Command Service
Druid
Superset
π PostgreSQL Restoration
1. Access the Pod
2. Pre-Check & Clean Old Data
3. Copy Backup File
4. Restore
π§ Redis Restoration
1. Download and Decompress
2. Access the Pod
3. Disable Persistence Temporarily
4. Replace and Restart
5. Verify Keys
6. Revert Configs
β
Final Steps After Restoration
Once PostgreSQL and Redis are restored:
Validate all datasets and operational behavior through the Obsrv console.
π Druid Data Migration
Use this section to migrate Druid segment data from one S3 bucket to another.
π Pre-Migration Checklist
Ensure all segments are copied to the target bucket.
Scale down Druid components:
π¦ Migration Execution
Deploy a temporary Python pod:
Run the migration script inside the pod using the AWS CLI or boto3 script to copy paths from the old bucket to the new bucket.
β οΈ Important Notes
Always take snapshots before migration or restoration.
Confirm with stakeholders before starting restoration in a production environment.
Monitor resource usage post-restoration for anomalies.
Last updated
