Data Backup and Restoration
Instructions to restoration of obsrv from the backups
Date: Friday, 05.07.2024
Introduction
Section titled “Introduction”This document provides a complete end-to-end restoration playbook for Obsrv, including:
- Restoration using Velero snapshots
- Selective Redis & PostgreSQL restoration
- Flink, Druid, Superset pauses/resumes
- S3-based Druid segment migration
- Python scripts used for updating
druid_segments
It is intended for operational recovery, migration, DR rehearsals, and environment cloning.
1. Backup Storage Reference
Section titled “1. Backup Storage Reference”The following objects store backups across AWS S3 and supporting services:
| Service | Backup Location |
|---|---|
| PostgreSQL | backups-{building_block}-{env}-{account-id} |
| Denorm Redis | backups-{building_block}-{env}-{account-id} |
| Dedup Redis | backups-{building_block}-{env}-{account-id} |
| Dataset Events | {building_block}-{env}-{account_id} |
| Infra Terraform State | AWS_TERRAFORM_BACKEND_BUCKET_NAME |
| Velero Backups | velero-{building_block}-{env}-{account-id} |
| Flink Checkpoints | checkpoint-{building_block}-{env}-{account-id} |
2. Velero Restoration — Full Environment Restore
Section titled “2. Velero Restoration — Full Environment Restore”This procedure restores the entire Obsrv deployment from a Velero backup.
2.1 Prerequisites
Section titled “2.1 Prerequisites”- AWS CLI installed and configured
- Cluster kubeconfig available
- Velero CLI installed
Commands (Install Velero CLI):
wget https://github.com/vmware-tanzu/velero/releases/download/v1.3.2/velero-v1.3.2-linux-amd64.tar.gztar -xvf velero-v1.3.2-linux-amd64.tar.gz -C /tmpsudo mv /tmp/velero-v1.3.2-linux-amd64/velero /usr/local/binvelero version2.2 Restore Workflow
Section titled “2.2 Restore Workflow”Restore all services
Section titled “Restore all services”velero restore create --from-backup <backup-name>Example:
velero restore create --from-backup velero-obsrv-daily-backup-20240904133016-20240904190534Restore a specific namespace (Example: PostgreSQL)
Section titled “Restore a specific namespace (Example: PostgreSQL)”velero restore create --from-backup <backup-name> --include-namespaces <namespace>Check Status
Section titled “Check Status”velero restore describe <restore-name>2.3 Post-Restore Validation
Section titled “2.3 Post-Restore Validation”Perform:
- Pod status verification
- PostgreSQL/Redis data checks
- Flink/Druid processing resumption
- Query validation from Superset/Obsrv APIs
3. Redis & PostgreSQL — Targeted Restoration
Section titled “3. Redis & PostgreSQL — Targeted Restoration”This procedure is used if buckets/paths did not change, and only data rollback is required.
3.1 Pause Streaming and Query Services
Section titled “3.1 Pause Streaming and Query Services”Scale deployments to 0 replicas for:
- Flink jobs
- Druid
- Superset
- API services
- Web Console
Examples:
kubectl scale deployment --all --replicas=0 -n druid-rawkubectl scale deployment --all --replicas=0 -n flinkkubectl scale deployment --all --replicas=0 -n supersetkubectl scale deployment --all --replicas=0 -n dataset-apikubectl scale deployment --all --replicas=0 -n web-console3.2 PostgreSQL Restore
Section titled “3.2 PostgreSQL Restore”Enter the Pod
Section titled “Enter the Pod”kubectl exec -it <pod-name> -n <namespace> -- /bin/bashPre-Cleanup
Section titled “Pre-Cleanup”drop database druid_raw;drop database obsrv;drop database superset;
create database druid_raw;create database obsrv;create database superset;Copy Backup File
Section titled “Copy Backup File”kubectl cp ./backup.sql postgresql/obsrv-postgresql-0:/tmp/db.sqlRun Restore
Section titled “Run Restore”psql -U postgres -f /tmp/db.sql3.3 Redis Restore
Section titled “3.3 Redis Restore”Download + Decompress
Section titled “Download + Decompress”bzip2 -dk fulldb-{dd-mm-yyyy}.rdb.bz2Enter Redis Pod
Section titled “Enter Redis Pod”kubectl exec -it obsrv-<instance>-redis-master-0 -n redis -- shDisable AOF + Save
Section titled “Disable AOF + Save”redis-cliconfig get saveconfig set appendonly noconfig set save ""SAVEOverwrite DB File
Section titled “Overwrite DB File”kubectl cp ./dump.rdb redis/obsrv-<instance>-redis-master-0:/data/dump.rdbkubectl delete pod obsrv-<instance>-redis-master-0 -n redisVerification
Section titled “Verification”kubectl exec -it obsrv-<instance>-redis-master-0 -n redis -- sh -c 'redis-cli info'Re-enable Configs
Section titled “Re-enable Configs”redis-cliconfig set appendonly yesconfig set save "3600 1 300 100 60 10000"SAVE4. Resume Services
Section titled “4. Resume Services”Once Postgres/Redis is restored:
kubectl scale deployment --all --replicas=1 -n flinkkubectl scale deployment --all --replicas=1 -n druid-rawkubectl scale deployment --all --replicas=1 -n supersetkubectl scale deployment --all --replicas=1 -n dataset-apikubectl scale deployment --all --replicas=1 -n web-consoleVerify datasets and pipeline resumes in Obsrv console.
5. Druid Segment Migration — S3 Bucket to S3 Bucket
Section titled “5. Druid Segment Migration — S3 Bucket to S3 Bucket”This section changes metadata in PostgreSQL so Druid looks at a new S3 bucket.
5.1 Preconditions
Section titled “5.1 Preconditions”- Segment files must already be copied into target bucket
- Druid scaled down:
kubectl scale deployment --all --replicas=0 -n druid-raw5.2 Create Python Pod
Section titled “5.2 Create Python Pod”python_server.yaml:
apiVersion: v1kind: Podmetadata: name: python-pod labels: app: python-appspec: containers: - name: python-container image: python:3.9 command: ["sleep", "3600"]Apply:
kubectl apply -f python_server.yaml -n postgresqlInstall dependency:
pip install psycopg25.3 Migration Script
Section titled “5.3 Migration Script”Copy script:
kubectl cp ./druid-migrate.py postgresql/python-pod:/tmp/druid-migrate.pydruid-migrate.py
import psycopg2, json
conn = psycopg2.connect( host="obsrv-postgresql-hl.postgresql.svc.cluster.local", port=5432, database="druid_raw", user="druid_raw", password="")
cur = conn.cursor()cur.execute("SELECT * FROM druid_segments")
for row in cur.fetchall(): loadSpec = json.loads(row[8].tobytes())['loadSpec'] print("\nloadSpec before:", json.dumps(loadSpec)) loadSpec['bucket'] = "new-bucket-name" print("loadSpec after:", json.dumps(loadSpec)) payload = json.loads(row[8].tobytes()) payload['loadSpec'] = loadSpec sql = "UPDATE druid_segments SET payload=%s WHERE id=%s" val = (memoryview(json.dumps(payload).encode()), row[0]) cur.execute(sql, val)
conn.commit()conn.close()print("\nMigration Completed.")Run:
python /tmp/druid-migrate.pyYou should see logs like:
loadSpec before: {"type":"s3_zip","bucket":"old-bucket"...}loadSpec after: {"type":"s3_zip","bucket":"new-bucket"...}5.4 Verification Script
Section titled “5.4 Verification Script”Copy:
kubectl cp ./data-verification.py postgresql/python-pod:/tmp/data-verification.pydata-verification.py
import psycopg2, json
conn = psycopg2.connect( host="obsrv-postgresql-hl.postgresql.svc.cluster.local", port=5432, database="druid_raw", user="druid_raw", password="")
cur = conn.cursor()cur.execute("SELECT * FROM druid_segments LIMIT 1")row = cur.fetchone()print(json.loads(row[8].tobytes()))Expected Output:
'loadSpec': {'bucket': 'new-bucket-name', ...}5.5 Restart Druid
Section titled “5.5 Restart Druid”kubectl scale deployment --all --replicas=1 -n druid-rawCheck Historical logs and Druid console — segments should load successfully.