Applies to: any new Linux server intended for production where you will store, process, or transit personal data — including the metadata kind (IP addresses, user agents, login timestamps), which counts under GDPR whether or not your application has a “users” table.
Why this matters
GDPR Article 25 — Data protection by design and by default — is the legal text that turns “we should think about privacy” into “you must demonstrate that you thought about privacy, on paper, before you started processing.” It applies whether or not you have a website with a signup form. An Nginx access log with full client IPs is processing personal data.
The economic argument is simpler: privacy decisions made before the server exists cost nothing. The same decisions made after launch cost a retention-policy rewrite, a backup re-key, a logging refactor, a privacy policy update, and (if you got unlucky) a supervisory authority enquiry.
This guide lists the decisions you should make and write down before you
run apt install on a new server. Everything here is design-level, not
configuration-level — the implementation lives in the per-component
guides this one links to.
1. Pick your processors before you sign anything
Every external service that touches the server is a processor under GDPR Article 28. You need a Data Processing Agreement (DPA) with each one, and you need to know which jurisdiction they operate in, before you commit to them. Doing this after deployment means renegotiating contracts or migrating providers.
| Role | Default choice | What to check |
|---|---|---|
| Hosting | An EEA-based VPS provider | Where the physical server sits (region, not just billing entity); whether the provider’s own DPA is published; whether disks are encrypted on the hypervisor |
| TLS CA | Let’s Encrypt | Validates domains only — does not process visitor personal data, so no DPA required for them. Document this in your RoPA anyway. |
| CDN / WAF | Optional — adds a processor | If used, the CDN sees every request and every IP. DPA required; check default retention of their logs. |
| Backup target | Same region as hosting | Off-site backups crossing borders trigger Article 44+ obligations. Default to in-EEA targets. |
| Email transactional | EEA-hosted SMTP or EEA endpoint of an SES-style provider | Most email providers retain message bodies for days by default — check their retention controls. |
| Monitoring / APM | Self-hosted or EEA-hosted SaaS | Application performance tools see request URLs and sometimes user IDs. Treat as processors. |
| Analytics | Privacy-first (cookieless, no IP storage) | Privacy-first tools sidestep most cookie-consent and DPA complexity. |
| DNS | Any reputable provider | DNS queries are metadata, but a DPA is still appropriate for the registrar/host. |
Write the list down. This list is the start of your record of processing activities (RoPA).
2. Decide what you will and will not log
The single most common privacy-by-design failure is “we’ll just turn on the default logs and figure out retention later.” Default web server logs capture full client IP, User-Agent, referrer, and full request path — a fingerprint that combined identifies most users uniquely. Under GDPR, IP addresses are personal data (CJEU C-582/14, Breyer).
Decide for each log stream:
- Do we need it at all? Most error logs: yes. Per-request access logs at the edge: usually no — your application logs already cover what you need.
- What do we strip? Common choices: zero the last octet of IPv4, last 64 bits of IPv6, drop the User-Agent entirely, drop query strings.
- What do we hash? If you need stable per-session aggregation without identifying users, hash the IP with a salt that rotates daily. You get “unique visitors per day” without retaining identifying data overnight.
Example minimal Nginx log_format that you might decide is what you need:
1# Privacy-preserving access log: masks the last IPv4 octet,
2# drops User-Agent and query strings entirely.
3map $remote_addr $ip_masked {
4 "~^(?<a>\d+\.\d+\.\d+)\.\d+$" "$a.0";
5 "~^(?<a>[0-9a-fA-F:]+):[0-9a-fA-F:]+$" "$a::";
6 default "0.0.0.0";
7}
8
9log_format minimal '$ip_masked - [$time_iso8601] '
10 '"$request_method $uri $server_protocol" '
11 '$status $body_bytes_sent $request_time';
12
13access_log /var/log/nginx/access.log minimal;
This is one example, not the only correct answer — the point is that the log format is a decision, written down, justified.
3. Decide retention before you collect
Default retention in many Linux components is “rotate when the file gets big” — i.e. effectively forever for a quiet server. GDPR requires you to retain personal data for no longer than necessary (Article 5(1)(e)). Pick a number for each data class and configure to it.
A defensible starting set of defaults — adjust to your real needs:
| Data class | Default retention | Why |
|---|---|---|
| Web access logs | 30 days | Operational debugging window; longer rarely justified |
| Error logs | 90 days | Slow-burn bugs sometimes take that long to surface |
| Auth / sudo / audit logs | 12 months | Investigations may reach back further than ops debugging |
| Backups (daily rotation) | 30 days | Recovery window for accidental deletion / corruption |
| Backups (monthly archives) | 12 months | Only if you have a documented reason — otherwise drop them |
| Application sessions | 14–30 days, hard expiry | “Stay signed in forever” is a retention liability |
| Email server logs | 30 days | Mail delivery debugging window |
Configure logrotate, your backup tool’s retention policy, and your
session store’s TTL to these numbers on day one. A retention policy that
exists only on paper is not a retention policy.
4. Encrypt in transit, at rest, and in backups
These three are different problems with different solutions.
In transit is the easiest: TLS everywhere, internal and external, no
exceptions. See /guides/nginx-tls-2026/ for
the public-facing edge; PostgreSQL, Redis, and SSH all support TLS or
equivalent and should use it.
At rest at the VPS layer means:
- Hypervisor-level disk encryption (provided by most reputable VPS hosts — check their docs and document it in your RoPA).
- LUKS inside the VM if you need defence against the host operator themselves; rare but real for high-sensitivity workloads.
- Application-layer encryption for the truly sensitive columns (credentials, tokens, ID document data) — separate problem, separate guide.
In backups is the one people forget. A backup tool that does not
encrypt by default writes your entire database — including personal data
— to wherever the backup target lives. Use tools that encrypt by default
(restic, borgbackup) and store the encryption passphrase somewhere
that survives the loss of the production server: a password manager you
trust, not an environment variable on the server you’re backing up.
5. Least privilege from day zero
Privacy obligations include limiting who inside your organisation can access personal data (Article 5(1)(f), Article 32(1)(b)). The infrastructure decisions:
- No shared accounts. Every human gets their own SSH key and their own
audit trail. See
/guides/ssh-hardening/. - Application database user has only the privileges the application
needs. Never
SUPERUSER. Never own the database — see/guides/postgresql-hardening/. - Operators access via
sudo, not asroot.sudologging is on by default; keep it that way and route it to your audit log retention bucket. - Production secrets (passphrases, API keys, signing keys) live in a
secrets manager, not in
.envfiles in/var/www.
6. Make the data deletion-ready
Article 17 — the right to erasure — is implemented mostly at the application layer, but the infrastructure decisions you make now determine how clean that implementation can be.
- Backups. When a user requests deletion, the data persists in your backups until the retention window expires. This is generally accepted by supervisory authorities if your backup retention is short, documented, and the data is removed on the next restore. Long backup retention turns Article 17 from a one-line response into a multi-week ticket.
- Cascading deletes. Decide your foreign-key strategy now — orphaned records bearing personal data are a common Article 17 failure mode.
- Log retention shorter than data retention. Otherwise logs about the user persist after the user has been deleted from the application.
7. Be able to detect a breach before you have to notify
Articles 33 and 34 give you 72 hours from awareness to notify the supervisory authority. Awareness is the trigger, so the question is: would you become aware?
Minimum infrastructure-level signals to capture and alert on:
- Authentication failures (SSH, sudo, application admin), with thresholds.
- File integrity on critical paths (
/etc, application binaries) —aideortripwire. - Outbound connections to unexpected destinations — at minimum an egress firewall that logs blocks.
- Privileged database access outside expected hours.
You do not need a SIEM on day one. You need at least one alert channel that a human reads, fed by at least the above signals.
8. Document everything in this guide as you do it
The decisions above only count as “by design” if they are written down. Three artefacts, kept current:
- Record of Processing Activities (RoPA) — Article 30. Lists every processing activity, every category of personal data, every processor, every retention period. The processor table in section 1 is the start of this.
- Processor list with DPA references — one row per external service from section 1, with a link to the executed DPA.
- Data Protection Impact Assessment (DPIA) — Article 35. Required only for high-risk processing; if you’re unsure, do one anyway — the exercise is cheap and the result is evidence either way.
These artefacts live outside this repository (typically in a privacy register or a controlled-access document store), but the triggers to update them are infrastructure changes. Treat any production change to the processor list, the data model, or the retention policies as triggering a documentation update.
Gotchas
“We don’t collect personal data”
You almost certainly do. The IP address logged by Nginx is personal data under settled EU case law. The session cookie tied to a user account is personal data. The User-Agent is, in combination, often personal data. The question is not whether you collect any — it is which, how little, and for how long.
A CDN does not absolve you of processor obligations
Putting Cloudflare or a similar CDN in front of your application does not make Cloudflare the controller. You remain the controller; Cloudflare is your processor, and you need a DPA with them. The CDN’s own log retention is a question to answer in your RoPA, not a question to ignore.
EU-only customers does not mean EU-only processors
Even if every user is in the EEA, a US-based monitoring SaaS that ingests your application logs is a cross-border transfer of personal data, subject to Article 44 and (currently) the EU–US Data Privacy Framework or SCCs. Choose processors deliberately or restructure to avoid the transfer.
Backups that outlive the policy you publish
A privacy policy stating “we retain access logs for 30 days” is contradicted by daily backups of the log volume kept for 90 days. If backups contain personal data — they usually do — their retention is the effective retention period for that data. Either shorten the backups or lengthen the policy. The two numbers must agree.
What this guide deliberately does not cover
- Cookie consent and UX — application-layer; separate guide planned.
- DPIA methodology — out of scope for an infrastructure guide; a good starting point is the supervisory authority guidance for your jurisdiction.
- International transfer mechanisms (SCCs, TIAs) — legal-team territory; signpost only.
- Application-layer encryption schemes — separate guide planned.