Issue #43: feat(infra): Advanced security hardening (fail2ban, WAF, IDS)

State:: OPEN
Milestone:: Jalon 1: Sécurité & GDPR 🔒
Labels:: phase:vps,track:infrastructure priority:high
Assignees:: Unassigned
Created:: 2025-10-27
Updated:: 2025-11-17
URL:: View on GitHub
Description

## Context

Current security implementation (60% complete):
- ✅ UFW firewall (ports 22, 80, 443)
- ✅ fail2ban installed (default config)
- ✅ Non-root Docker containers
- ✅ Traefik security headers (HSTS, X-Frame-Options, etc.)
- ✅ Rate limiting (100 req/min per IP)

**Missing critical security layers:**
- ❌ Custom fail2ban jails for application-specific attacks
- ❌ Web Application Firewall (WAF) rules
- ❌ Intrusion Detection System (IDS)
- ❌ SSH hardening (key-only, 2FA optional)
- ❌ Security audit automation

## Objective

Harden VPS security for production deployment with defense-in-depth strategy.

## Implementation Plan

### 1. fail2ban Custom Jails

**Current:** fail2ban installed with default jails (SSH only)

**Add custom jails:**

**File:** `infrastructure/ansible/templates/fail2ban-koprogo.conf.j2`

```ini
# /etc/fail2ban/jail.d/koprogo.conf

[sshd]
enabled = true
port = ssh
logpath = /var/log/auth.log
maxretry = 3
bantime = 3600
findtime = 600

[traefik-auth]
enabled = true
port = http,https
logpath = /var/log/traefik/access.log
maxretry = 5
bantime = 1800
findtime = 300
filter = traefik-auth

[traefik-badbots]
enabled = true
port = http,https
logpath = /var/log/traefik/access.log
maxretry = 2
bantime = 86400
findtime = 600
filter = traefik-badbots

[koprogo-api-abuse]
enabled = true
port = http,https
logpath = /var/log/koprogo/backend.log
maxretry = 20
bantime = 3600
findtime = 60
filter = koprogo-api-abuse
```

**Create filters:**

`/etc/fail2ban/filter.d/traefik-auth.conf`:
```
[Definition]
failregex = ^.* (40[13]) .*$
ignoreregex =
```

`/etc/fail2ban/filter.d/traefik-badbots.conf`:
```
[Definition]
failregex = ^.* "(.*bot.*|.*crawler.*|.*spider.*)" .*$
ignoreregex = (googlebot|bingbot|slackbot)
```

`/etc/fail2ban/filter.d/koprogo-api-abuse.conf`:
```
[Definition]
failregex = Rate limit exceeded for IP: <HOST>
            Authentication failed for IP: <HOST>
ignoreregex =
```

**Deploy via Ansible:**
```yaml
- name: Deploy fail2ban custom configuration
  template:
    src: fail2ban-koprogo.conf.j2
    dest: /etc/fail2ban/jail.d/koprogo.conf
  notify: restart fail2ban

- name: Deploy fail2ban filters
  copy:
    src: "{{ item }}"
    dest: /etc/fail2ban/filter.d/
  with_items:
    - traefik-auth.conf
    - traefik-badbots.conf
    - koprogo-api-abuse.conf
  notify: restart fail2ban
```

### 2. Web Application Firewall (WAF) - Traefik Plugin

**Option A: Traefik CrowdSec Plugin** (Recommended)

Install CrowdSec bouncer for Traefik:
```bash
# Install CrowdSec
curl -s https://packagecloud.io/install/repositories/crowdsec/crowdsec/script.deb.sh | sudo bash
apt install crowdsec crowdsec-firewall-bouncer-iptables

# Install Traefik bouncer
apt install crowdsec-traefik-bouncer
```

**docker-compose.yml update:**
```yaml
services:
  traefik:
    labels:
      - "traefik.http.middlewares.crowdsec.plugin.bouncer.enabled=true"
      - "traefik.http.middlewares.crowdsec.plugin.bouncer.crowdseclapikey=${CROWDSEC_API_KEY}"
```

**Benefits:**
- Shared threat intelligence from CrowdSec community
- Automatic IP reputation blocking
- Behavioral analysis
- Easy integration with Traefik

**Option B: ModSecurity WAF Rules** (Alternative)

Use OWASP Core Rule Set (CRS) with custom Traefik integration.

**Deploy via Ansible:**
```yaml
- name: Install CrowdSec
  apt:
    name:
      - crowdsec
      - crowdsec-firewall-bouncer-iptables
      - crowdsec-traefik-bouncer
    state: present

- name: Configure CrowdSec Traefik bouncer
  template:
    src: crowdsec-traefik-bouncer.yml.j2
    dest: /etc/crowdsec/bouncers/traefik-bouncer.yml
```

### 3. Intrusion Detection System (IDS)

**Install Suricata** (lightweight IDS/IPS):

```bash
apt install suricata
```

**Enable Suricata rules:**
```yaml
# /etc/suricata/suricata.yaml
rule-files:
  - suricata.rules
  - emerging-threats.rules
  - local.rules

# Custom rules for KoproGo
alert http any any -> any any (msg:"SQL Injection Attempt"; content:"SELECT"; content:"FROM"; sid:1000001;)
alert http any any -> any any (msg:"XSS Attempt"; content:"<script"; sid:1000002;)
alert http any any -> any any (msg:"Path Traversal"; content:"../"; sid:1000003;)
```

**Monitor Suricata alerts:**
- Logs: `/var/log/suricata/fast.log`
- Integration with monitoring stack (Loki)
- Alert on critical events via Prometheus

**Ansible task:**
```yaml
- name: Install Suricata IDS
  apt:
    name: suricata
    state: present

- name: Enable Suricata service
  systemd:
    name: suricata
    enabled: yes
    state: started

- name: Deploy custom Suricata rules
  template:
    src: suricata-local.rules.j2
    dest: /etc/suricata/rules/local.rules
  notify: reload suricata
```

### 4. SSH Hardening

**Current:** SSH enabled on port 22, password authentication allowed

**Harden SSH configuration:**

`/etc/ssh/sshd_config` updates:
```
# Disable password authentication (key-only)
PasswordAuthentication no
PubkeyAuthentication yes
PermitRootLogin prohibit-password

# Disable empty passwords
PermitEmptyPasswords no

# Limit authentication attempts
MaxAuthTries 3

# Reduce login grace time
LoginGraceTime 30

# Restrict SSH protocol
Protocol 2

# Disable X11 forwarding
X11Forwarding no

# Enable strict mode
StrictModes yes

# Log verbosity
LogLevel VERBOSE

# Optional: Change SSH port (security through obscurity)
# Port 2222
```

**2FA via Google Authenticator (Optional):**
```bash
apt install libpam-google-authenticator
```

Update `/etc/pam.d/sshd`:
```
auth required pam_google_authenticator.so
```

**Ansible task:**
```yaml
- name: Harden SSH configuration
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: "{{ item.regexp }}"
    line: "{{ item.line }}"
  with_items:
    - { regexp: '^PasswordAuthentication', line: 'PasswordAuthentication no' }
    - { regexp: '^PermitRootLogin', line: 'PermitRootLogin prohibit-password' }
    - { regexp: '^MaxAuthTries', line: 'MaxAuthTries 3' }
    - { regexp: '^LoginGraceTime', line: 'LoginGraceTime 30' }
  notify: restart sshd
```

### 5. Security Audit Automation

**Install Lynis** (security auditing tool):
```bash
apt install lynis
```

**Schedule weekly security audits:**
```bash
# Cron job: every Sunday at 3am
0 3 * * 0 /usr/bin/lynis audit system --cronjob | tee /var/log/lynis/audit-$(date +\%Y\%m\%d).log
```

**Parse Lynis results and alert on issues:**
- Integration with monitoring stack
- Alert if security score drops below threshold (e.g., 75/100)

**Ansible task:**
```yaml
- name: Install Lynis security auditing
  apt:
    name: lynis
    state: present

- name: Schedule weekly security audit
  cron:
    name: "Lynis security audit"
    minute: "0"
    hour: "3"
    weekday: "0"
    job: "/usr/bin/lynis audit system --cronjob | tee /var/log/lynis/audit-$(date +\\%Y\\%m\\%d).log"
```

### 6. Additional Security Measures

**A. Automatic Security Updates:**
```bash
apt install unattended-upgrades
dpkg-reconfigure -plow unattended-upgrades
```

**B. Rootkit Detection (rkhunter):**
```bash
apt install rkhunter
rkhunter --update
rkhunter --propupd
```

Schedule daily scans:
```bash
# Cron: daily at 4am
0 4 * * * /usr/bin/rkhunter --check --skip-keypress --report-warnings-only
```

**C. File Integrity Monitoring (AIDE):**
```bash
apt install aide
aideinit
```

**D. Kernel Hardening (sysctl):**

`/etc/sysctl.d/99-koprogo-hardening.conf`:
```
# IP Forwarding
net.ipv4.ip_forward = 0

# SYN Cookies
net.ipv4.tcp_syncookies = 1

# Ignore ICMP redirects
net.ipv4.conf.all.accept_redirects = 0
net.ipv6.conf.all.accept_redirects = 0

# Ignore source routed packets
net.ipv4.conf.all.accept_source_route = 0

# Log Martians
net.ipv4.conf.all.log_martians = 1

# Disable IPv6 (if not used)
net.ipv6.conf.all.disable_ipv6 = 1
```

Apply with: `sysctl -p`

## Testing & Validation

- [ ] fail2ban custom jails block IPs after threshold
- [ ] CrowdSec blocks known malicious IPs
- [ ] Suricata detects SQL injection/XSS attempts (test with dummy payloads)
- [ ] SSH login requires key only (password fails)
- [ ] Lynis security score > 75/100
- [ ] rkhunter detects no rootkits
- [ ] AIDE baseline established
- [ ] Security updates applied automatically

## Monitoring Integration

- [ ] fail2ban metrics in Prometheus (`fail2ban_exporter`)
- [ ] Suricata alerts in Loki
- [ ] Lynis score tracked in Grafana
- [ ] CrowdSec dashboard integrated
- [ ] Alert on security score degradation

## Documentation

- [ ] Update `infrastructure/README.md` with security procedures
- [ ] Document fail2ban jail configurations
- [ ] Create incident response playbook
- [ ] Document SSH key management
- [ ] Update CLAUDE.md with security posture

## Acceptance Criteria

- [ ] fail2ban custom jails active (SSH, Traefik, API abuse)
- [ ] CrowdSec WAF protecting Traefik endpoints
- [ ] Suricata IDS monitoring network traffic
- [ ] SSH hardened (key-only, reduced login grace time)
- [ ] Weekly Lynis audits scheduled
- [ ] Daily rkhunter scans scheduled
- [ ] AIDE file integrity monitoring active
- [ ] Kernel hardened via sysctl
- [ ] Automatic security updates enabled
- [ ] Monitoring dashboards show security metrics
- [ ] Documentation complete

## Resource Impact

- CrowdSec: ~50MB RAM
- Suricata: ~100MB RAM
- Lynis/rkhunter: Cron jobs, minimal overhead
- **Total: ~150MB RAM additional**

## Effort Estimate

**Medium** (2 days)
- Day 1: fail2ban jails + CrowdSec WAF + SSH hardening
- Day 2: Suricata IDS + security audit tools + testing

## Related

- Supports: Production security posture
- Integrates with: Issue #41 (monitoring stack)
- Complements: Issue #39 (encryption at rest)

## References

- fail2ban: https://www.fail2ban.org/
- CrowdSec: https://www.crowdsec.net/
- Suricata: https://suricata.io/
- Lynis: https://cisofy.com/lynis/
- OWASP WAF: https://owasp.org/www-project-modsecurity-core-rule-set/