Privacy by Design for a New Server Build

Applies to: any new Linux server intended for production where you will store, process, or transit personal data — including the metadata kind (IP addresses, user agents, login timestamps), which counts under GDPR whether or not your application has a “users” table.

Why this matters

GDPR Article 25 — Data protection by design and by default — is the legal text that turns “we should think about privacy” into “you must demonstrate that you thought about privacy, on paper, before you started processing.” It applies whether or not you have a website with a signup form. An Nginx access log with full client IPs is processing personal data.

The economic argument is simpler: privacy decisions made before the server exists cost nothing. The same decisions made after launch cost a retention-policy rewrite, a backup re-key, a logging refactor, a privacy policy update, and (if you got unlucky) a supervisory authority enquiry.

This guide lists the decisions you should make and write down before you run apt install on a new server. Everything here is design-level, not configuration-level — the implementation lives in the per-component guides this one links to.

1. Pick your processors before you sign anything

Every external service that touches the server is a processor under GDPR Article 28. You need a Data Processing Agreement (DPA) with each one, and you need to know which jurisdiction they operate in, before you commit to them. Doing this after deployment means renegotiating contracts or migrating providers.

Role	Default choice	What to check
Hosting	An EEA-based VPS provider	Where the physical server sits (region, not just billing entity); whether the provider’s own DPA is published; whether disks are encrypted on the hypervisor
TLS CA	Let’s Encrypt	Validates domains only — does not process visitor personal data, so no DPA required for them. Document this in your RoPA anyway.
CDN / WAF	Optional — adds a processor	If used, the CDN sees every request and every IP. DPA required; check default retention of their logs.
Backup target	Same region as hosting	Off-site backups crossing borders trigger Article 44+ obligations. Default to in-EEA targets.
Email transactional	EEA-hosted SMTP or EEA endpoint of an SES-style provider	Most email providers retain message bodies for days by default — check their retention controls.
Monitoring / APM	Self-hosted or EEA-hosted SaaS	Application performance tools see request URLs and sometimes user IDs. Treat as processors.
Analytics	Privacy-first (cookieless, no IP storage)	Privacy-first tools sidestep most cookie-consent and DPA complexity.
DNS	Any reputable provider	DNS queries are metadata, but a DPA is still appropriate for the registrar/host.

Write the list down. This list is the start of your record of processing activities (RoPA).

2. Decide what you will and will not log

The single most common privacy-by-design failure is “we’ll just turn on the default logs and figure out retention later.” Default web server logs capture full client IP, User-Agent, referrer, and full request path — a fingerprint that combined identifies most users uniquely. Under GDPR, IP addresses are personal data (CJEU C-582/14, Breyer).

Decide for each log stream:

Do we need it at all? Most error logs: yes. Per-request access logs at the edge: usually no — your application logs already cover what you need.
What do we strip? Common choices: zero the last octet of IPv4, last 64 bits of IPv6, drop the User-Agent entirely, drop query strings.
What do we hash? If you need stable per-session aggregation without identifying users, hash the IP with a salt that rotates daily. You get “unique visitors per day” without retaining identifying data overnight.

Example minimal Nginx log_format that you might decide is what you need:

 1# Privacy-preserving access log: masks the last IPv4 octet,
 2# drops User-Agent and query strings entirely.
 3map $remote_addr $ip_masked {
 4    "~^(?<a>\d+\.\d+\.\d+)\.\d+$"        "$a.0";
 5    "~^(?<a>[0-9a-fA-F:]+):[0-9a-fA-F:]+$" "$a::";
 6    default                              "0.0.0.0";
 7}
 8
 9log_format minimal '$ip_masked - [$time_iso8601] '
10                   '"$request_method $uri $server_protocol" '
11                   '$status $body_bytes_sent $request_time';
12
13access_log /var/log/nginx/access.log minimal;

This is one example, not the only correct answer — the point is that the log format is a decision, written down, justified.

3. Decide retention before you collect

Default retention in many Linux components is “rotate when the file gets big” — i.e. effectively forever for a quiet server. GDPR requires you to retain personal data for no longer than necessary (Article 5(1)(e)). Pick a number for each data class and configure to it.

A defensible starting set of defaults — adjust to your real needs:

Data class	Default retention	Why
Web access logs	30 days	Operational debugging window; longer rarely justified
Error logs	90 days	Slow-burn bugs sometimes take that long to surface
Auth / sudo / audit logs	12 months	Investigations may reach back further than ops debugging
Backups (daily rotation)	30 days	Recovery window for accidental deletion / corruption
Backups (monthly archives)	12 months	Only if you have a documented reason — otherwise drop them
Application sessions	14–30 days, hard expiry	“Stay signed in forever” is a retention liability
Email server logs	30 days	Mail delivery debugging window

Configure logrotate, your backup tool’s retention policy, and your session store’s TTL to these numbers on day one. A retention policy that exists only on paper is not a retention policy.

4. Encrypt in transit, at rest, and in backups

These three are different problems with different solutions.

In transit is the easiest: TLS everywhere, internal and external, no exceptions. See /guides/nginx-tls-2026/ for the public-facing edge; PostgreSQL, Redis, and SSH all support TLS or equivalent and should use it.

At rest at the VPS layer means:

Hypervisor-level disk encryption (provided by most reputable VPS hosts — check their docs and document it in your RoPA).
LUKS inside the VM if you need defence against the host operator themselves; rare but real for high-sensitivity workloads.
Application-layer encryption for the truly sensitive columns (credentials, tokens, ID document data) — separate problem, separate guide.

In backups is the one people forget. A backup tool that does not encrypt by default writes your entire database — including personal data — to wherever the backup target lives. Use tools that encrypt by default (restic, borgbackup) and store the encryption passphrase somewhere that survives the loss of the production server: a password manager you trust, not an environment variable on the server you’re backing up.

5. Least privilege from day zero

Privacy obligations include limiting who inside your organisation can access personal data (Article 5(1)(f), Article 32(1)(b)). The infrastructure decisions:

No shared accounts. Every human gets their own SSH key and their own audit trail. See /guides/ssh-hardening/.
Application database user has only the privileges the application needs. Never SUPERUSER. Never own the database — see /guides/postgresql-hardening/.
Operators access via sudo, not as root. sudo logging is on by default; keep it that way and route it to your audit log retention bucket.
Production secrets (passphrases, API keys, signing keys) live in a secrets manager, not in .env files in /var/www.

6. Make the data deletion-ready

Article 17 — the right to erasure — is implemented mostly at the application layer, but the infrastructure decisions you make now determine how clean that implementation can be.

Backups. When a user requests deletion, the data persists in your backups until the retention window expires. This is generally accepted by supervisory authorities if your backup retention is short, documented, and the data is removed on the next restore. Long backup retention turns Article 17 from a one-line response into a multi-week ticket.
Cascading deletes. Decide your foreign-key strategy now — orphaned records bearing personal data are a common Article 17 failure mode.
Log retention shorter than data retention. Otherwise logs about the user persist after the user has been deleted from the application.

7. Be able to detect a breach before you have to notify

Articles 33 and 34 give you 72 hours from awareness to notify the supervisory authority. Awareness is the trigger, so the question is: would you become aware?

Minimum infrastructure-level signals to capture and alert on:

Authentication failures (SSH, sudo, application admin), with thresholds.
File integrity on critical paths (/etc, application binaries) — aide or tripwire.
Outbound connections to unexpected destinations — at minimum an egress firewall that logs blocks.
Privileged database access outside expected hours.

You do not need a SIEM on day one. You need at least one alert channel that a human reads, fed by at least the above signals.

8. Document everything in this guide as you do it

The decisions above only count as “by design” if they are written down. Three artefacts, kept current:

Record of Processing Activities (RoPA) — Article 30. Lists every processing activity, every category of personal data, every processor, every retention period. The processor table in section 1 is the start of this.
Processor list with DPA references — one row per external service from section 1, with a link to the executed DPA.
Data Protection Impact Assessment (DPIA) — Article 35. Required only for high-risk processing; if you’re unsure, do one anyway — the exercise is cheap and the result is evidence either way.

These artefacts live outside this repository (typically in a privacy register or a controlled-access document store), but the triggers to update them are infrastructure changes. Treat any production change to the processor list, the data model, or the retention policies as triggering a documentation update.

Gotchas

“We don’t collect personal data”

You almost certainly do. The IP address logged by Nginx is personal data under settled EU case law. The session cookie tied to a user account is personal data. The User-Agent is, in combination, often personal data. The question is not whether you collect any — it is which, how little, and for how long.

A CDN does not absolve you of processor obligations

Putting Cloudflare or a similar CDN in front of your application does not make Cloudflare the controller. You remain the controller; Cloudflare is your processor, and you need a DPA with them. The CDN’s own log retention is a question to answer in your RoPA, not a question to ignore.

EU-only customers does not mean EU-only processors

Even if every user is in the EEA, a US-based monitoring SaaS that ingests your application logs is a cross-border transfer of personal data, subject to Article 44 and (currently) the EU–US Data Privacy Framework or SCCs. Choose processors deliberately or restructure to avoid the transfer.

Backups that outlive the policy you publish

A privacy policy stating “we retain access logs for 30 days” is contradicted by daily backups of the log volume kept for 90 days. If backups contain personal data — they usually do — their retention is the effective retention period for that data. Either shorten the backups or lengthen the policy. The two numbers must agree.

What this guide deliberately does not cover

Cookie consent and UX — application-layer; separate guide planned.
DPIA methodology — out of scope for an infrastructure guide; a good starting point is the supervisory authority guidance for your jurisdiction.
International transfer mechanisms (SCCs, TIAs) — legal-team territory; signpost only.
Application-layer encryption schemes — separate guide planned.

Why this matters#

1. Pick your processors before you sign anything#

2. Decide what you will and will not log#

3. Decide retention before you collect#

4. Encrypt in transit, at rest, and in backups#

5. Least privilege from day zero#

6. Make the data deletion-ready#

7. Be able to detect a breach before you have to notify#

8. Document everything in this guide as you do it#

Gotchas#

“We don’t collect personal data”#

A CDN does not absolve you of processor obligations#

EU-only customers does not mean EU-only processors#

Backups that outlive the policy you publish#

What this guide deliberately does not cover#

Why this matters

1. Pick your processors before you sign anything

2. Decide what you will and will not log

3. Decide retention before you collect

4. Encrypt in transit, at rest, and in backups

5. Least privilege from day zero

6. Make the data deletion-ready

7. Be able to detect a breach before you have to notify

8. Document everything in this guide as you do it

Gotchas

“We don’t collect personal data”

A CDN does not absolve you of processor obligations

EU-only customers does not mean EU-only processors

Backups that outlive the policy you publish

What this guide deliberately does not cover