Who this is for: developers in their first year or two of writing production code, and the engineers and team leads mentoring them. If you operate the infrastructure rather than write the application, the companion piece is
privacy-by-design-server-build.
Introduction
Most developers learn privacy backwards. They ship a feature; later someone in legal or security flags an issue; later still they retrofit the fix. By that point the data model has been baked in, the API contract is in production, and the rollback is more expensive than the right-thing-to-do would have been on day one.
There is a better way to learn it. Privacy is not a compliance task you tick off after the feature exists — it is a way of thinking that shapes what gets built in the first place. The earlier you build the habit, the more naturally it surfaces in your work, and the less it ever costs you.
This article is about that habit. Read it once in your first month; read it again a year later when the patterns will mean more.
1. What “privacy by design” actually means
The phrase comes from Ann Cavoukian’s seven foundational principles, written in the 1990s and later embedded in GDPR Article 25. The legal text around them is dry. The ideas underneath it are not.
Translated to what they mean for the person writing code:
- Be proactive, not reactive. Anticipate privacy problems before they become incidents.
- Privacy is the default. A user who does nothing should still have the maximum reasonable privacy. They should have to opt in to anything that erodes it.
- Build it in. Privacy is not a wrapper around a system — it is woven through the design.
- Aim for both. Privacy and functionality are not a zero-sum trade. If the design feels like one, the design is wrong.
- Protect the whole lifecycle. Data should be safe from collection through deletion, not just while it sits in your database.
- Be transparent. People can verify what you are doing with their data.
- Respect the user. Their interests come first — particularly where they conflict with yours.
These translate into specific habits at the keyboard. We will cover those next.
2. Where interns first go wrong
Four patterns account for most early-career mistakes. None of them are stupid. All of them are easy to make and easy to fix once you have seen them.
Debug logging captures personal data
You are debugging an issue, so you add a log line:
1logger.info(f"User authenticated: {user.email} from {request.remote_addr}")
This is fine on your laptop. It is a data-protection issue the moment it ships to production. Email and IP address are both personal data under GDPR; together they fingerprint the individual cleanly. Your application log retention is probably 30 days or longer — you have just made a privacy decision without thinking about it.
The fix is to log identifiers, not identities:
1logger.info(f"User authenticated: id={user.id}")
The information you actually need for debugging is “which user, which session”. An internal ID gives you that without retaining the address.
API responses returning more than needed
You write a /users/me endpoint. The fastest path is
return user.to_dict(), which serialises every column. Now the
front-end has access to fields you never intended to expose: hashed
passwords, internal flags, audit timestamps, sometimes other people’s
data if a join was sloppy.
The discipline is to define the response shape explicitly:
1return {
2 "id": user.id,
3 "display_name": user.display_name,
4 "email": user.email, # because this endpoint specifically needs it
5}
If a new field is needed, it becomes a deliberate decision — not a side-effect of an ORM convenience.
Schemas that don’t think about minimisation
The signup form asks for name, email, date of birth, address, phone number, and “anything else?” because the product team thought it might be useful. None of those fields have an explicit purpose. Some of them are never read after registration.
For each field, ask: why is this here? When will it be used? If you cannot answer both, drop it. You can always add a field later. You cannot, in any practical sense, retroactively un-collect data you never needed.
Test data using real PII
You are reproducing a bug, so you copy a row from production into your dev database. Or you import a CSV of real customer data into staging because the synthetic data does not exercise the edge case. Now production-quality personal data lives in an environment with weaker access controls, no encryption at rest, and on developer laptops that are less hardened than the production servers.
The right reflex is synthetic data first, generated data when
synthetic is insufficient (Faker and equivalents are excellent),
and a documented, time-limited exception with named ownership when
real data really is the only option.
3. Real decisions, real consequences
Take a simple example: a user registration endpoint for a small SaaS.
The “we’ll fix it later” version
1@app.post("/register")
2def register():
3 data = request.json
4 user = User(
5 email=data["email"],
6 password=hash_password(data["password"]),
7 full_name=data["full_name"],
8 date_of_birth=data["date_of_birth"],
9 phone_number=data["phone_number"],
10 address=data["address"],
11 )
12 db.session.add(user)
13 db.session.commit()
14
15 logger.info(f"New user: {user.email} ({user.full_name}), {user.address}")
16
17 return {"user": user.to_dict()}
Walking the problems:
- Collects six personal-data fields without justification.
- Logs three of them in plaintext, where they persist for the log retention window.
- Returns the full user object — including the password hash if
to_dict()is not filtered.
The privacy-first version
1@app.post("/register")
2def register():
3 data = request.json
4 user = User(
5 email=data["email"],
6 password=hash_password(data["password"]),
7 display_name=data.get("display_name"), # optional
8 )
9 db.session.add(user)
10 db.session.commit()
11
12 logger.info(f"New user registered: id={user.id}")
13
14 return {
15 "id": user.id,
16 "display_name": user.display_name,
17 }
Same business outcome — a registered user. Different decisions about data:
- Only fields the product actually needs at registration.
- Logs an internal identifier, not the user’s identity.
- Response is the minimum the caller needs.
If six months later the product needs date of birth for a regulatory check on a specific flow, you collect it then, for that purpose, with a clear lawful basis. That is how privacy by design works in practice: every field has a reason, and the absence of a reason is itself a decision.
4. The four questions every developer should ask
Before any feature touches personal data, four questions:
- Do I actually need this data? If you can ship the feature without it, you should. “It might be useful later” is not a justification.
- Where will it be stored, and for how long? Most personal data has an implicit infinite retention by default. Make it explicit.
- Who can access it, and why? If everyone in engineering can read it, you have an audit problem waiting to happen.
- What happens if this is breached? The breach scenario is the design constraint. If the answer is “we have to notify every affected user under GDPR Article 34”, you might design the system differently.
Make those questions part of how you read your own code. After a while you stop having to ask them consciously — the answers become part of how you write the code in the first place.
5. Building the habit
Three practices turn the ideas above from intention into reflex.
Privacy review as part of every PR
Add it to the pull-request template. Two lines:
Does this change introduce, persist, expose, or transmit personal data? If yes, has the minimum-necessary principle been applied?
You will answer “no” most of the time. That is the point. When you do answer “yes”, you will catch decisions early that would be expensive to reverse later.
Talk about data decisions with your team
The first time you push back on a product spec because it asks for fields the feature does not need, the conversation is uncomfortable. The fifth time, it is normal. By the tenth, the spec arrives with the privacy thinking already done.
This is the habit-forming part: cultural change is downstream of consistent small behaviour. You do not need to be senior to start it.
Notice that it makes you a better engineer
Code that minimises data is, almost always, clearer code. Smaller schemas. Tighter APIs. Less debugging surface. Privacy-aware engineering correlates with engineering maturity — not because compliance is virtuous, but because the same discipline that asks “do I need this field?” also asks “do I need this abstraction?”.
Conclusion
Confidence in this work comes from understanding why each decision matters. Privacy by design is professional craft — the same kind of skill as writing readable code or designing reliable systems. It is not bureaucratic overhead.
A useful shorthand for what “good” looks like, end-to-end across this site, is practical, explainable, compliant — in that order. The same three checks apply to a single function:
- Practical: does it serve the actual user need, with no more data than required?
- Explainable: if a reviewer asks “why does this field exist?”, do you have an answer?
- Compliant: would the answer also satisfy a privacy officer or an auditor?
Bake those three checks into how you read code. The rest follows.
Where to go next
privacy-by-design-server-build— the infrastructure-side companion to this article, for when you move from “what fields to collect” to “what server to put them on”.postgresql-hardeningandredis-hardening— the database tier that has to honour the minimisation decisions you make at the application layer.