MailArchive v2 — Self-Hosted Gmail Archiving on Kubernetes
Sync every Gmail label to .eml files on a persistent volume, search across the lot with full-text indexing, and export the whole archive whenever you need it — all from a three-pane web UI that feels like a real email client.
Google Takeout gives you a one-time dump. IMAP clients give you a live view but nothing persisted. MailArchive sits in between: it continuously syncs your Gmail to plain .eml files on a Kubernetes PVC, builds a searchable Whoosh index across everything, and wraps it in a three-pane web UI with virtual scrolling, attachment downloads, and bulk export. Your email lives on your own storage, in an open format, importable into any client.
Version 2 adds full-text search, attachment downloads, a persistent disk-backed header cache, and bulk zip export — the pieces that turn a sync daemon into something you'd actually use day-to-day.
How it works
The backend is a FastAPI app running an IMAP sync engine, a Whoosh indexer, and an APScheduler task that fires every 24 hours. The frontend is a React SPA served by nginx. Both run as separate deployments on Kubernetes, sharing a 50Gi PVC where everything lands: the .eml files, the search index, the sync state, the header caches, and any export zips in progress.
Browser
│
▼
React SPA (nginx)
│
▼
FastAPI backend
├── IMAP sync engine (incremental, per-folder)
├── Whoosh full-text index
└── APScheduler (24-hour cycle)
│
▼
PVC 50Gi (/mail/)
├── .sync_state.json
├── .index/ ← Whoosh index
├── INBOX/
│ ├── .cache.json ← persistent header cache
│ └── *.eml
├── Work/
└── Personal/
The three-pane layout gives you a folder list on the left, an email list top-right (with virtual scrolling for large folders), and the rendered email detail below. HTML emails render in a sandboxed iframe; plaintext falls back gracefully.
Sync behaviour
On first run, the backend detects no .sync_state.json and kicks off a full sync of all Gmail labels automatically. For large mailboxes this takes a while — watch the logs. After that, only emails with UIDs higher than the last seen UID per folder are fetched. Progress is saved after each folder completes, so a pod restart mid-sync doesn't lose work.
You can also trigger syncs manually: Sync Now in the top bar hits POST /api/sync for a full run, or hover over any folder in the sidebar and click the ⟳ icon to sync just that folder immediately. The UI polls status every 3 seconds and updates folder counts as labels complete, so you can browse normally while a sync runs in the background.
To force a full re-sync from scratch, delete /mail/.sync_state.json from the PVC and restart the pod.
Full-text search
Search is powered by a Whoosh index stored at /mail/.index/ on the PVC. On first startup a background thread walks all existing .eml files and indexes anything not already in the index — a progress bar in the top bar shows percentage complete while this runs. New emails are indexed immediately as they're downloaded during sync. The index persists across pod restarts; it only rebuilds missing entries.
Search covers subject (2x boost), from (1.5x boost), to, and the full email body. Whoosh query syntax is supported:
from:amazon receipt
subject:"order confirmation"
holiday OR travel
"exact phrase"
Results appear inline in the email list pane with source folder, date, from, subject, and a body snippet. Hit Escape or the ✕ button to return to normal folder browsing.
Caching
Each folder keeps a .cache.json alongside its .eml files containing the pre-sorted header list used to populate the email list pane. It's built on first access, then loaded from disk on all subsequent accesses — including after pod restarts. Sync invalidates it automatically when new emails arrive. This means folders with thousands of emails open instantly after the first visit, without scanning .eml headers every time.
Bulk export
The export button has three states:
- Export All — click to start building the zip on the PVC in the background
- Preparing X% — progress bar polls every 2 seconds while the zip builds
- Download Ready — click to stream the completed zip to your browser
The zip preserves the folder structure and is stored as a temp file on the PVC until the next export build clears it. You can close the browser and come back — the download link stays valid.
mailarchive-export-20260407-120000.zip
INBOX/
1_abc123.eml
Work/
...
Personal/
...
This format imports directly into Thunderbird (via ImportExportTools NG), Apple Mail, or any IMAP client. For migrating to a new provider, a short Python script using imaplib.append() can push each .eml back into the correct folder on the destination server.
Gmail setup
You need a Gmail App Password before deploying — standard IMAP with your Google password won't work:
- Google Account → Security → enable 2-Step Verification
- Security → App passwords → create one named "MailArchive"
- Copy the 16-character password
Deployment
1. Build and push images
# Backend
cd backend/
docker build -t your-registry/mailarchive-backend:latest .
docker push your-registry/mailarchive-backend:latest
# Frontend
cd ../frontend/
docker build -t your-registry/mailarchive-frontend:latest .
docker push your-registry/mailarchive-frontend:latest
2. Configure credentials
Edit k8s/secret.yaml:
stringData:
IMAP_USER: "yourname@gmail.com"
IMAP_PASS: "xxxx xxxx xxxx xxxx"
Update the image fields in both deployment manifests, and set your ingress hostname in k8s/frontend-deployment.yaml.
3. Deploy
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/secret.yaml
kubectl apply -f k8s/pvc.yaml
kubectl apply -f k8s/backend-deployment.yaml
kubectl apply -f k8s/frontend-deployment.yaml
4. Verify
kubectl get pods -n mailarchive
kubectl logs -n mailarchive -l component=backend -f
# Check sync and search index status
kubectl port-forward -n mailarchive svc/mailarchive-backend 8000:8000
curl http://localhost:8000/api/status
curl http://localhost:8000/api/search/status
Configuration
| Variable | Default | Description |
|---|---|---|
IMAP_HOST |
imap.gmail.com |
IMAP server hostname |
IMAP_PORT |
993 |
IMAP SSL port |
IMAP_USER |
(from secret) | Gmail address |
IMAP_PASS |
(from secret) | 16-character app password |
MAIL_ROOT |
/mail |
PVC mount path |
SYNC_INTERVAL_HOURS |
24 |
Hours between automatic syncs |
Updating without losing state
All state lives on the PVC — sync progress, search index, and header caches all survive pod restarts. To update the image safely:
kubectl scale deployment mailarchive-backend -n mailarchive --replicas=0
# push new image
kubectl scale deployment mailarchive-backend -n mailarchive --replicas=1
The new pod reads .sync_state.json from the PVC and picks up where it left off. No automatic sync is triggered when the state file exists — use Sync Now if you want one immediately.
The deployment usesstrategy: Recreaterather thanRollingUpdatebecauseReadWriteOncevolumes can only be mounted by one pod at a time.
Expanding the PVC
If your mailbox outgrows 50Gi and your StorageClass supports online expansion:
kubectl patch pvc mailarchive-pvc -n mailarchive \
-p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
Local development
# Backend
cd backend/
pip install -r requirements.txt
IMAP_USER=you@gmail.com IMAP_PASS="xxxx xxxx xxxx xxxx" MAIL_ROOT=./mail python main.py
# Frontend (separate terminal)
cd frontend/
npm install
npm run dev # Vite proxies /api calls to localhost:8000
API reference
| Endpoint | Method | Description |
|---|---|---|
GET /api/status |
GET | Sync status, last run, emails downloaded, index status |
POST /api/sync |
POST | Trigger a full sync of all folders |
POST /api/sync/folder/{folder} |
POST | Trigger immediate sync of a single folder |
GET /api/folders |
GET | List all folders with email counts |
GET /api/emails/{folder} |
GET | List emails in a folder |
GET /api/email/{folder}/{filename} |
GET | Full email detail — body, headers, attachments |
GET /api/email/{folder}/{filename}/download |
GET | Download the raw .eml file |
GET /api/email/{folder}/{filename}/attachment/{index} |
GET | Download a specific attachment by index |
GET /api/search?q={query} |
GET | Full-text search across all folders |
GET /api/search/status |
GET | Search index document count and status |
POST /api/export/build |
POST | Start building the export zip in the background |
GET /api/export/status |
GET | Export build progress |
GET /api/export/download |
GET | Download the completed export zip |