Data Backup and Restoration

Instructions to restoration of obsrv from the backups

Date: Friday, 05.07.2024

Introduction

This document provides a complete end-to-end restoration playbook for Obsrv, including:

  • Restoration using Velero snapshots

  • Selective Redis & PostgreSQL restoration

  • Flink, Druid, Superset pauses/resumes

  • S3-based Druid segment migration

  • Python scripts used for updating druid_segments

It is intended for operational recovery, migration, DR rehearsals, and environment cloning.


1. Backup Storage Reference

The following objects store backups across AWS S3 and supporting services:

Service
Backup Location

PostgreSQL

backups-{building_block}-{env}-{account-id}

Denorm Redis

backups-{building_block}-{env}-{account-id}

Dedup Redis

backups-{building_block}-{env}-{account-id}

Dataset Events

{building_block}-{env}-{account_id}

Infra Terraform State

AWS_TERRAFORM_BACKEND_BUCKET_NAME

Velero Backups

velero-{building_block}-{env}-{account-id}

Flink Checkpoints

checkpoint-{building_block}-{env}-{account-id}


2. Velero Restoration — Full Environment Restore

This procedure restores the entire Obsrv deployment from a Velero backup.

2.1 Prerequisites

  • AWS CLI installed and configured

  • Cluster kubeconfig available

  • Velero CLI installed

Commands (Install Velero CLI):

2.2 Restore Workflow

Restore all services

Example:

Restore a specific namespace (Example: PostgreSQL)

Check Status

2.3 Post-Restore Validation

Perform:

  • Pod status verification

  • PostgreSQL/Redis data checks

  • Flink/Druid processing resumption

  • Query validation from Superset/Obsrv APIs


3. Redis & PostgreSQL — Targeted Restoration

This procedure is used if buckets/paths did not change, and only data rollback is required.

3.1 Pause Streaming and Query Services

Scale deployments to 0 replicas for:

  • Flink jobs

  • Druid

  • Superset

  • API services

  • Web Console

Examples:


3.2 PostgreSQL Restore

Enter the Pod

Pre-Cleanup

Copy Backup File

Run Restore


3.3 Redis Restore

Download + Decompress

Enter Redis Pod

Disable AOF + Save

Overwrite DB File

Verification

Re-enable Configs


4. Resume Services

Once Postgres/Redis is restored:

Verify datasets and pipeline resumes in Obsrv console.


5. Druid Segment Migration — S3 Bucket to S3 Bucket

This section changes metadata in PostgreSQL so Druid looks at a new S3 bucket.

5.1 Preconditions

  • Segment files must already be copied into target bucket

  • Druid scaled down:


5.2 Create Python Pod

python_server.yaml:

Apply:

Install dependency:


5.3 Migration Script

Copy script:

druid-migrate.py

Run:

You should see logs like:


5.4 Verification Script

Copy:

data-verification.py

Expected Output:


5.5 Restart Druid

Check Historical logs and Druid console — segments should load successfully.


Important Notes

  • Ensure that you have appropriate permissions and access rights to execute these commands within the Kubernetes environment.

  • Replace placeholders such as <pod-name>, <namespace>, <local-file-path>, <backup-file>, <username>, and <database-name> with the actual values relevant to your setup.


Last updated