On May 25 in 2018, the General Data Protection Regulation will take place in the EU. It’s goal is to give control back to citizens and residents over their personal data. They can ask you to
If you don’t ensure those rights, the sanctions can be pretty high.
… a fine up to €20 million or up to 4% of the annual worldwide turnover of the preceding financial year in case of an enterprise …
You may be thinking that you will just delete all of their records from your good old MySQL Instance, whenever the user demand it. But thinks have changed, especially software architecture and their requirements for data.
By using an Eventstore, you now have to think about how to forget the user. Since all messages are immutable, you can not just delete some of them. This would make your current state inconsitent.
Think about a bank, where each transaction is an entry inside your Eventstore.
By deleting all data of
A just became insolvent. So this is a no go.
Lets say you’re using machine learning and you train them on your users data. You can’t risk to loose a lot of data, because your data is pretty important for your models.
Your executives need reports on aggregated numbers. These can be number of users over the last year or the number of purchases in the last 3 months.
By deleting all the data of one user, your numbers can not diverge to the last report. It’s not that you somehow lost purchases. Neither does a new generation of an older report contain now less users than before.
The first time I heard about this idea was at Berlin Buzzwords in 2017. There was this talk by Lars Albertsson about “Protecting Privacy in Practice”.
Slide 20 from his slides
Instead of just deleting entries or trying to encrypt every message in your system, he suggests to encrypt the users id with a key.
By encrypting the personal link to your user, you have a lot of benefits.
There is more in this talk than just this, of course. Its about throwing away as much sensitive data about a user as possible per pipeline. To aggregate the average user age for your site, you don’t have to carry around everyone’s bank account number.
Also keep in mind that anonymisation is sometimes not enough. Even when you reduce the user data to only her age, you can be pretty sure that the original user is Nabi Tajima when you see the age 117.
For further reading, here are some additional resources: