All Articles

How to be compliant with GDPR while keeping your data

On May 25 in 2018, the General Data Protection Regulation will take place in the EU. It’s goal is to give control back to citizens and residents over their personal data. They can ask you to

  • access their data (right to access)
  • delete all their data (right to erasure)
  • export their data (data portability)

If you don’t ensure those rights, the sanctions can be pretty high.

… a fine up to €20 million or up to 4% of the annual worldwide turnover of the preceding financial year in case of an enterprise …


The problems

You may be thinking that you will just delete all of their records from your good old MySQL Instance, whenever the user demand it. But thinks have changed, especially software architecture and their requirements for data.

Eventstores can not forget

By using an Eventstore, you now have to think about how to forget the user. Since all messages are immutable, you can not just delete some of them. This would make your current state inconsitent.

Think about a bank, where each transaction is an entry inside your Eventstore.

  • A received 100 USD from B
  • C received 50 USD from A

By deleting all data of B, then A just became insolvent. So this is a no go.

I need my data for …

Lets say you’re using machine learning and you train them on your users data. You can’t risk to loose a lot of data, because your data is pretty important for your models.

Your executives need reports on aggregated numbers. These can be number of users over the last year or the number of purchases in the last 3 months.

By deleting all the data of one user, your numbers can not diverge to the last report. It’s not that you somehow lost purchases. Neither does a new generation of an older report contain now less users than before.

The solution

The first time I heard about this idea was at Berlin Buzzwords in 2017. There was this talk by Lars Albertsson about “Protecting Privacy in Practice”.


Slide 20 from his slides

Instead of just deleting entries or trying to encrypt every message in your system, he suggests to encrypt the users id with a key.

By encrypting the personal link to your user, you have a lot of benefits.

  1. You can forget an user by forgetting his encryption key.
  2. When he deletes his data, you can provide him the encryption key on the last screen.
  3. Without this key, you are no longer able to link your data to an actual user, you anonymized it.
  4. You can still use this data to generate reports, train your models etc.
  5. The user can even come back, by providing his key which you gave to him. You can now link the data back to him.

There is more in this talk than just this, of course. Its about throwing away as much sensitive data about a user as possible per pipeline. To aggregate the average user age for your site, you don’t have to carry around everyone’s bank account number.

Also keep in mind that anonymisation is sometimes not enough. Even when you reduce the user data to only her age, you can be pretty sure that the original user is Nabi Tajima when you see the age 117.

For further reading, here are some additional resources:

Published 13 Apr 2018

Software Developer based in Germany, mainly working with Scala, Javascript and Python.
Martin Seeler on Twitter