This Document is all about my backing up strategy.
Overview
Noah's Ark is the homelab's overarching backup and data-protection strategy. The name reflects the core principle: before any disaster strikes, everything critical must already be safely aboard. This document defines storage mediums, backup services, tier assignments, schedules, and retention policies.
Storage Mediums
Three physical storage mediums are used across the backup infrastructure. Each has a defined role and is not interchangeable.
NVMe (Ephemeral / Fast Scratch)
NVMe drives are used exclusively for operating system volumes, VM boot disks, and short-lived cache. They are explicitly not a backup target. TrueNAS and other storage VMs may have NVMe visibility for performance purposes, but no long-term backup data is written here.
- ✅ OS & boot volumes
- ✅ Cache and write-intent logs
- ❌ Long-term backup data
- ❌ Replicated datasets
HDD (Primary Backup Storage)
Spinning hard drives are the primary medium for all backup workloads. The current deployment is a 2×4 TB external HDD enclosure attached to the Proxmox Backup Server VM.
- ✅ PBS datastores
- ✅ TrueNAS bulk datasets
- ✅ Cold archive copies
- Higher capacity-to-cost ratio makes this the default for retention-heavy tiers
S3 (Offsite Object Storage)
S3 is the offsite layer of the backup strategy. It provides geographical separation from the physical homelab. Used for encrypted, versioned remote copies synced from local backup services.
- ✅ Offsite DR copy (Gold and Platinum tiers)
- ✅ Immutable/versioned bucket policies for ransomware resistance
- ✅ Lifecycle rules to tier cold data automatically
- ❌ Not a primary backup destination — always a sync target from local
Storage Services
Proxmox Backup Server (PBS)
PBS runs as a VM on Proxmox, managed with HA between pve-1 and pve-2 via Ceph. The VM has:
- 32 GB NVMe (OS/config disk — Ceph-backed for HA migration)
- 2×4 TB HDD via external enclosure (backup datastore — direct-attached to
pve-2)
⚠️ Known HA Limitation — Datastore Availability After Failover
Problem: The HDD enclosure is physically connected to pve-2. If pve-2 fails and the PBS VM migrates to pve-1, the VM starts successfully but the /datastore mount is unavailable because the drives are no longer accessible. PBS enters a degraded state.
Assessed options:
| Option | Feasibility | Notes |
|---|---|---|
| Accept manual recovery | ✅ Practical | Drive reconnection to pve-1 or manual pve-2 restart; suitable if RTO > 1 hour |
Expose drives via iSCSI from pve-2 |
✅ Recommended | pve-2 runs a lightweight iSCSI target (e.g. tgt or scst); PBS VM mounts over the network — works regardless of which hypervisor runs the VM |
| Move datastore to Ceph RBD | ⚠️ With caveats | Solves availability but writes backup data into the same Ceph cluster being backed up — not ideal, defeats the independence principle |
| Dedicated physical backup node | ✅ Long-term ideal | Removes the HA dependency entirely; PBS runs bare-metal on a dedicated machine with local drives |
Current recommendation: Implement the iSCSI approach as a medium-term fix. Target the HDD enclosure from pve-2 using tgt, connect the PBS VM via an iSCSI initiator, and mount as a block device. This makes the datastore network-addressable and removes the physical attachment constraint.
TODO — Validate iSCSI failover
- Set up
tgtonpve-2exposing/dev/sdXas iSCSI LUN - Configure PBS VM with
open-iscsiinitiator - Test PBS VM live migration to
pve-1with iSCSI session persistence - Confirm datastore mounts cleanly post-migration
- Document reconnection runbook for the manual-recovery fallback
PBS Datastore Layout (Planned)
/mnt/datastore/
├── primary/ # Main backup target for all tiers
└── archive/ # Cold copies, infrequently pruned
TrueNAS (VM on Proxmox)
TrueNAS runs as a VM and provides ZFS-backed network storage for the homelab. Its role in the backup strategy is:
- SMB/NFS share target for file-level backups from workstations and containers
- ZFS snapshot source — periodic snapshots replicated to PBS or S3
- Staging area for large datasets before S3 upload
TrueNAS is itself a backup source as well as a storage service — its ZFS datasets must be included in backup tier assignments.
Note
TrueNAS VMs access NVMe storage for ARC/L2ARC caching only. Pool vdevs use HDD.
Backup Tier System
The tier system assigns a protection level to each service, VM, or dataset. Higher tiers mean more frequent backups, longer retention, more redundancy, and verified restores. Assign tiers based on criticality and acceptable data-loss window.
🥉 Bronze — Local Snapshot, Best-Effort
Use for: Non-critical VMs, dev/test environments, disposable workloads.
RPO (Recovery Point Objective): Up to 7 days
RTO (Recovery Time Objective): Best-effort, no SLA
Storage: PBS local datastore only
Offsite: None
| Setting | Value |
|---|---|
| Schedule | Weekly (Sunday 02:00) |
| Retention — Keep Last | 2 |
| Retention — Keep Weekly | 4 |
| Retention — Keep Monthly | 1 |
| Encryption | No |
| Verify job | No |
| Restore tested | No |
Suitable for:
- Scratch VMs and containers
- Build/CI agents that can be reprovisioned from code
- Dev databases with no production data
🥈 Silver — Daily Local Backup, Short Retention
Use for: Standard homelab services — important but recoverable within a day.
RPO: 24 hours
RTO: < 4 hours
Storage: PBS local datastore + TrueNAS ZFS snapshots
Offsite: None
| Setting | Value |
|---|---|
| Schedule | Daily (03:00) |
| Retention — Keep Last | 7 |
| Retention — Keep Weekly | 4 |
| Retention — Keep Monthly | 2 |
| Encryption | Optional |
| Verify job | Weekly |
| Restore tested | Quarterly (manual spot check) |
Suitable for:
- Standard self-hosted services (dashboards, monitoring stacks)
- Personal media servers
- Non-production databases
- TrueNAS datasets containing media/documents
🥇 Gold — Daily Local + S3 Offsite, Extended Retention
Use for: Important services with irreplaceable or hard-to-recreate data.
RPO: 24 hours
RTO: < 2 hours (local); < 8 hours (from S3)
Storage: PBS local datastore + S3 sync
Offsite: ✅ S3 (encrypted, versioned)
| Setting | Value |
|---|---|
| Schedule | Daily (03:30) |
| Retention — Keep Last | 14 |
| Retention — Keep Weekly | 8 |
| Retention — Keep Monthly | 6 |
| Retention — Keep Yearly | 1 |
| Encryption | Required (PBS client-side encryption) |
| S3 sync | Daily after backup job completes |
| S3 bucket policy | Versioning enabled, 30-day object lock |
| Verify job | Weekly |
| Restore tested | Monthly (automated or manual) |
S3 Sync Method:
# Using proxmox-backup-client or rclone for datastore sync
rclone sync /mnt/datastore/primary s3:homelab-backups/primary \
--s3-server-side-encryption AES256 \
--transfers 4 \
--log-file /var/log/rclone-sync.log
Suitable for:
- PBS VM itself (meta-backup of the backup service)
- TrueNAS configuration and critical datasets
- Identity/auth services (e.g. Authentik, LLDAP)
- Home automation state (Home Assistant)
- Network config (MikroTik export backups)
💎 Platinum — Frequent Local + Offsite + Verified, Maximum Retention
Use for: Critical services where data loss or extended downtime is unacceptable.
RPO: 4 hours
RTO: < 1 hour (local)
Storage: PBS local datastore + TrueNAS ZFS replication + S3 offsite
Offsite: ✅ S3 (encrypted, immutable object lock)
| Setting | Value |
|---|---|
| Schedule | Every 6 hours (00:00 / 06:00 / 12:00 / 18:00) |
| Retention — Keep Last | 28 |
| Retention — Keep Weekly | 12 |
| Retention — Keep Monthly | 12 |
| Retention — Keep Yearly | 3 |
| Encryption | Required (PBS client-side, AES-256) |
| S3 sync | After every backup job |
| S3 bucket policy | Versioning + Object Lock (Compliance mode, 90 days) |
| Verify job | After every sync |
| Restore tested | Monthly automated restore drill |
Restore Drill Procedure:
- Spin up an isolated VLAN or test namespace on Proxmox
- Restore latest backup from PBS to test VM
- Validate service health (HTTP check, DB query, or equivalent)
- Log result in homelab runbook with timestamp
- Destroy test VM and clean up
Suitable for:
- Password manager (Vaultwarden)
- Certificate authority and PKI data
- Any VM storing financial records or irreplaceable personal documents
- DNS/DHCP config that underpins the entire network
- Wedding and personal photo archives
Tier Assignment Register
Work in progress
Populate this table as services are onboarded. Each service must have an explicit tier — unassigned means unprotected.
| Service / Dataset | Type | Tier | Notes |
|---|---|---|---|
| Vaultwarden | VM | 💎 Platinum | Password manager — zero tolerance for loss |
| Home Assistant | VM | 🥇 Gold | State + automations irreplaceable |
| Authentik | VM | 🥇 Gold | Auth provider for all services |
| TrueNAS (config) | VM | 🥇 Gold | Pool config and dataset structure |
| TrueNAS (media) | Dataset | 🥈 Silver | Re-downloadable, low urgency |
| MikroTik exports | File | 🥇 Gold | Network would be unrecoverable without |
| DNS (AdGuard/etc.) | VM | 🥈 Silver | Quickly reconfigurable |
| PBS VM itself | VM | 🥇 Gold | Back up the backup server |
| Dev/scratch VMs | VM | 🥉 Bronze | Disposable |
| Monitoring stack | VM | 🥈 Silver | Recoverable from config-as-code |
3-2-1 Compliance by Tier
A useful sanity check — the classic 3-2-1 rule states: 3 copies of data, on 2 different media types, with 1 offsite.
| Tier | Copies | Media Types | Offsite | 3-2-1 Compliant |
|---|---|---|---|---|
| 🥉 Bronze | 1 | HDD | ❌ | ❌ |
| 🥈 Silver | 2 (PBS + ZFS snap) | HDD | ❌ | Partial |
| 🥇 Gold | 3 (PBS + ZFS + S3) | HDD + Object | ✅ | ✅ |
| 💎 Platinum | 3+ (PBS + ZFS + S3) | HDD + Object | ✅ | ✅ |
Note on Bronze
Bronze is intentionally non-compliant. It is only appropriate for truly disposable workloads. If a Bronze service becomes important, it must be re-tiered.
Key Outstanding TODOs
- Validate iSCSI datastore approach for PBS HA failover (see PBS section)
- Assign tiers to all running VMs and containers
- Configure S3 bucket with versioning + object lock for Gold/Platinum
- Implement
rclonesync job and schedule via systemd timer or cron - Write automated restore drill script for Platinum tier
- Document manual recovery runbook for PBS datastore unavailability
- Review PBS encryption key backup — keys must be stored independently of the PBS datastore (e.g. printed, in Vaultwarden, and in S3)