Issue #44: feat: Implement document storage strategy (Local volume vs S3/MinIO)
- State:
CLOSED
- Milestone:
Jalon 2: Conformité Légale Belge 📋
- Labels:
phase:vps,track:infrastructure priority:high
- Assignees:
Unassigned
- Created:
2025-10-27
- Updated:
2025-11-13
- URL:
Description
## Context
KoproGo needs a production-ready document storage solution for:
- Meeting minutes (PDF)
- Financial statements
- Invoices
- Contracts
- Work quotes
- Owner documents
**Current implementation:**
- Backend FileStorage service uses local filesystem (`backend/src/infrastructure/storage/file_storage.rs`)
- Files stored in `/uploads/building-id/filename`
- Max file size: 50MB
- Works in development but needs production strategy
## Problem
**Local filesystem limitations:**
- Single point of failure (VPS disk)
- Backup complexity (need to backup file volumes separately)
- Scaling limitations (disk space on single VPS)
- No built-in redundancy
- Docker volume management overhead
**Production requirements:**
- Reliable storage with backups
- Encrypted at rest
- Accessible from multiple instances (future K3s/K8s)
- Cost-effective for MVP phase
## Options Analysis
### Option 1: Local Docker Volume (Simple)
**Architecture:**
```yaml
# docker-compose.yml
volumes:
backend_uploads:
driver: local
driver_opts:
type: none
o: bind
device: /opt/koprogo/uploads # Mounted on encrypted LUKS volume
```
**Pros:**
- ✅ Simple setup (no external service)
- ✅ Low cost (included in VPS)
- ✅ Fast access (local disk)
- ✅ No egress costs
- ✅ GDPR compliant (data stays in EU)
**Cons:**
- ❌ Single point of failure
- ❌ Manual backup required (via cron + GPG + S3)
- ❌ Limited by VPS disk size (40GB for d2-2)
- ❌ Migration complexity when scaling to K3s
**Cost:** €0/month (included in VPS)
---
### Option 2: MinIO (Self-Hosted S3-Compatible)
**Architecture:**
```yaml
# docker-compose.yml
services:
minio:
image: minio/minio:latest
command: server /data --console-address ":9001"
volumes:
- minio_data:/data
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD}
ports:
- "9000:9000"
- "9001:9001" # Console
volumes:
minio_data:
driver: local
```
**Backend integration:**
```rust
// Use aws-sdk-s3 crate with MinIO endpoint
let config = aws_config::load_from_env().await;
let client = aws_sdk_s3::Client::new(&config);
```
**Pros:**
- ✅ S3-compatible API (easy migration to cloud S3 later)
- ✅ Built-in versioning
- ✅ Web console for management
- ✅ Erasure coding for redundancy (if multi-disk)
- ✅ Encryption at rest support
- ✅ Stays on VPS (no external dependency)
**Cons:**
- ❌ Additional resource overhead (~200MB RAM)
- ❌ Still single-server (no HA on single VPS)
- ❌ Complexity vs local filesystem
- ❌ Still needs backup strategy
**Cost:** €0/month (self-hosted on VPS)
**RAM impact:** ~200MB (10% of 2GB VPS)
---
### Option 3: External S3 (Scaleway/OVH Object Storage)
**Backend integration:**
```rust
// Use aws-sdk-s3 with Scaleway/OVH endpoint
let config = Config::builder()
.endpoint_url("https://s3.fr-par.scw.cloud")
.build();
let client = aws_sdk_s3::Client::from_conf(config);
```
**Providers:**
| Provider | Storage Cost | Transfer Out | Requests |
|----------|-------------|--------------|----------|
| **Scaleway** | €0.01/GB/month | €0.01/GB | Free (first 75GB) |
| **OVH** | €0.011/GB/month | Free | €0.0001/10k |
| **Backblaze B2** | €0.005/GB/month | €0.01/GB (first 3x free) | Free |
**Example cost (100GB storage, 10GB transfer/month):**
- Scaleway: €1.10/month
- OVH: €1.10/month
- Backblaze B2: €0.50/month
**Pros:**
- ✅ High availability (99.9% SLA)
- ✅ Automatic backups/replication
- ✅ Unlimited scalability
- ✅ Offloads VPS resources
- ✅ Built-in encryption at rest
- ✅ Easy K3s/K8s integration (same S3 backend)
- ✅ GDPR compliant (EU regions available)
**Cons:**
- ❌ External dependency (network required)
- ❌ Monthly cost (starts low, grows with usage)
- ❌ Egress costs for downloads
- ❌ Slightly higher latency vs local disk
**Cost:** ~€1-2/month initially (MVP scale)
---
## Recommendation: **Hybrid Approach** (Best of Both Worlds)
### Phase 1 (MVP - Q4 2025): MinIO on VPS
- Self-hosted MinIO container
- S3-compatible API from day 1
- Backup to external S3 (Backblaze B2 cheapest)
- Encryption at rest via LUKS volume
### Phase 2 (Production - Q1 2026): Migrate to External S3
- Switch to Scaleway/OVH Object Storage
- Same S3 API (minimal code changes)
- Better HA and redundancy
- Offload VPS resources
**Migration path:**
```bash
# Sync MinIO to external S3
mc mirror minio/koprogo-documents s3/koprogo-documents
# Update backend env vars (S3 endpoint + credentials)
# No code changes needed (same AWS S3 SDK)
```
---
## Implementation Plan
### 1. Backend Storage Abstraction
**Create trait:** `backend/src/application/ports/object_storage.rs`
```rust
#[async_trait]
pub trait ObjectStorage: Send + Sync {
async fn upload(&self, key: &str, data: Vec<u8>) -> Result<String, String>;
async fn download(&self, key: &str) -> Result<Vec<u8>, String>;
async fn delete(&self, key: &str) -> Result<(), String>;
async fn exists(&self, key: &str) -> Result<bool, String>;
async fn list(&self, prefix: &str) -> Result<Vec<String>, String>;
}
```
**Implementations:**
`backend/src/infrastructure/storage/local_storage.rs` (existing):
```rust
pub struct LocalStorage {
base_path: PathBuf,
}
impl ObjectStorage for LocalStorage {
async fn upload(&self, key: &str, data: Vec<u8>) -> Result<String, String> {
let path = self.base_path.join(key);
tokio::fs::write(path, data).await.map_err(|e| e.to_string())?;
Ok(key.to_string())
}
// ...
}
```
`backend/src/infrastructure/storage/s3_storage.rs` (new):
```rust
use aws_sdk_s3::Client;
pub struct S3Storage {
client: Client,
bucket: String,
}
impl ObjectStorage for S3Storage {
async fn upload(&self, key: &str, data: Vec<u8>) -> Result<String, String> {
self.client
.put_object()
.bucket(&self.bucket)
.key(key)
.body(data.into())
.send()
.await
.map_err(|e| e.to_string())?;
Ok(key.to_string())
}
// ...
}
```
### 2. Configuration (Environment Variables)
`backend/.env`:
```bash
# Storage backend: "local" or "s3"
STORAGE_BACKEND=s3
# S3 Configuration (MinIO or external S3)
S3_ENDPOINT=http://minio:9000 # MinIO local
# S3_ENDPOINT=https://s3.fr-par.scw.cloud # Scaleway
S3_BUCKET=koprogo-documents
S3_REGION=fr-par
S3_ACCESS_KEY=minioadmin
S3_SECRET_KEY=minioadmin
S3_USE_PATH_STYLE=true # Required for MinIO
# Local storage fallback
LOCAL_STORAGE_PATH=/uploads
```
### 3. Docker Compose - MinIO
**Add to:** `deploy/production/docker-compose.yml`
```yaml
services:
minio:
image: minio/minio:RELEASE.2024-10-13T13-34-11Z
command: server /data --console-address ":9001"
volumes:
- minio_data:/data
environment:
MINIO_ROOT_USER: ${MINIO_ROOT_USER}
MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD}
MINIO_SERVER_URL: http://minio:9000
networks:
- koprogo-network
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 10s
retries: 3
labels:
- "traefik.enable=true"
- "traefik.http.routers.minio-console.rule=Host(`minio.${DOMAIN}`)"
- "traefik.http.services.minio-console.loadbalancer.server.port=9001"
# MinIO client for initial bucket creation
minio-init:
image: minio/mc:latest
depends_on:
- minio
entrypoint: >
/bin/sh -c "
mc alias set minio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD};
mc mb minio/koprogo-documents --ignore-existing;
mc anonymous set download minio/koprogo-documents;
exit 0;
"
networks:
- koprogo-network
volumes:
minio_data:
driver: local
```
### 4. Backend Cargo Dependencies
`backend/Cargo.toml`:
```toml
[dependencies]
aws-config = "1.1.0"
aws-sdk-s3 = "1.10.0"
```
### 5. Dependency Injection
`backend/src/main.rs`:
```rust
let storage: Arc<dyn ObjectStorage> = match env::var("STORAGE_BACKEND").as_deref() {
Ok("s3") => {
let config = aws_config::load_from_env().await;
let s3_client = aws_sdk_s3::Client::new(&config);
Arc::new(S3Storage::new(s3_client, env::var("S3_BUCKET").unwrap()))
}
_ => {
Arc::new(LocalStorage::new(env::var("LOCAL_STORAGE_PATH").unwrap()))
}
};
```
### 6. Backup Strategy (MinIO → External S3)
**Add to backup script:**
`infrastructure/ansible/templates/backup-minio.sh.j2`:
```bash
#!/bin/bash
# Backup MinIO to Backblaze B2
mc alias set minio http://localhost:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
mc alias set b2 https://s3.us-west-000.backblazeb2.com ${B2_KEY_ID} ${B2_APPLICATION_KEY}
# Mirror MinIO to B2
mc mirror --overwrite minio/koprogo-documents b2/koprogo-backups/documents
# Encrypt and upload metadata
mc ls minio/koprogo-documents --json | gzip | gpg --encrypt > /backups/minio-metadata-$(date +%Y%m%d).json.gz.gpg
```
Cron: Daily at 3am UTC
### 7. Monitoring
**Prometheus metrics for MinIO:**
```yaml
scrape_configs:
- job_name: 'minio'
metrics_path: /minio/v2/metrics/cluster
static_configs:
- targets: ['minio:9000']
```
**Grafana dashboard:** MinIO Overview (ID: 13502)
---
## Testing & Validation
- [ ] LocalStorage implementation works (existing)
- [ ] S3Storage implementation works (MinIO)
- [ ] Upload/download via MinIO successful
- [ ] Switch between local and S3 via env var
- [ ] MinIO console accessible (minio.domain.com)
- [ ] Backup script syncs MinIO → B2
- [ ] Encryption at rest (LUKS volume under MinIO)
- [ ] Performance acceptable (<100ms upload for 1MB file)
- [ ] Migration script tested (MinIO → external S3)
## Security
- [ ] MinIO credentials strong (min 20 chars)
- [ ] MinIO console behind Traefik auth
- [ ] S3 bucket not publicly readable (except specific files)
- [ ] Encryption at rest (LUKS volume)
- [ ] Backup encryption (GPG)
## Documentation
- [ ] Update CLAUDE.md with storage architecture
- [ ] Document MinIO setup and configuration
- [ ] Create migration guide (local → MinIO → external S3)
- [ ] Document backup and restore procedures
## Acceptance Criteria
- [ ] ObjectStorage trait abstraction implemented
- [ ] LocalStorage implementation complete
- [ ] S3Storage implementation complete (MinIO compatible)
- [ ] MinIO container in docker-compose
- [ ] Environment-based storage backend selection
- [ ] MinIO backup to external S3 (Backblaze B2)
- [ ] Monitoring integrated (Prometheus + Grafana)
- [ ] Documentation complete
- [ ] Migration path validated
## Cost Analysis (MVP Phase)
**Option 1 (Local Only):**
- Storage: €0 (VPS disk)
- Backup: €0.50/month (Backblaze B2, 100GB)
- **Total: €0.50/month**
**Option 2 (MinIO + B2 Backup):**
- MinIO: €0 (self-hosted)
- Backup: €0.50/month (Backblaze B2, 100GB)
- RAM overhead: 200MB (10% of VPS)
- **Total: €0.50/month + RAM overhead**
**Option 3 (External S3 Only):**
- Scaleway S3: €1.10/month (100GB + 10GB transfer)
- No backup needed (built-in redundancy)
- **Total: €1.10/month**
**Recommendation for MVP:** **Option 2 (MinIO + B2)** - Best migration path
## Effort Estimate
**Medium** (2 days)
- Day 1: ObjectStorage trait + S3Storage implementation + MinIO docker-compose
- Day 2: Backup script + monitoring + testing + documentation
## Related
- Depends on: Issue #39 (encryption at rest - LUKS volume for MinIO)
- Depends on: Issue #40 (encrypted backups - B2 integration)
- Enables: File upload UI issue (needs storage backend)
- Future: Migrate to external S3 in Q1 2026
## References
- MinIO: https://min.io/docs/minio/linux/index.html
- AWS S3 SDK Rust: https://docs.rs/aws-sdk-s3/
- Backblaze B2: https://www.backblaze.com/b2/cloud-storage.html
- Scaleway Object Storage: https://www.scaleway.com/en/object-storage/