BBC R&D Data Privacy Project using RDF & Solid

The BBC are experimenting with decentralised use data storage based on Solid pods.

We have been looking at alternatives to storing user data in large-scale databases - ‘data lakes’ - in ways that respect people’s privacy but can still support rich data-driven services, while delivering real value to audiences. The model we are exploring could offer many benefits, from reducing data storage costs to making severe data compromises less likely. It could also let audience members make full use of the provisions for data portability in current data protection law as they move between online services.
We began experimenting with this technology back in 2018 with the BBC Box project, and we have been working since then to address scalability issues and ways to make the user’s ‘data journey’ understandable. We are currently using an open-source tool called Solid, originally developed by web creator Sir Tim Berners-Lee. Solid stores user data in ‘pods’ and we have used these to build a user-centred data storage system and a prototype media discovery tool to test our ideas in practice.

Only the user and the applications they permit can access the data in the pod. Any user activity generated through the use of services (posts, photos, play history, etc.) is saved back to the user’s encrypted pod, rather than to the unified databases of the service providers.

In our first demo, a user creates a new data pod on the BBC system and then links their BBC and Spotify user accounts to pull in some of their media play histories. This data is then processed on the user’s device to create a media profile, which is used to search against BBC News, music, podcasts and programme archive to allow users to find content related to their favourite artists.

It’s a simple but effective demonstration of a secure cross-service application. At no time does the BBC get to see the user’s Spotify data, and Spotify does not receive a copy of the users BBC data, as all data processing is done in the app. As well as keeping data safe, it means if you want to move to another service, your data is already under your control, and you don’t need to ask for it.

We are also adding some open data sources to the mix, like weather and environmental datasets, and linking with third-party applications through a data exchange layer. In time we hope to create new conventions and standards to make interoperability between service providers simpler.