10 years of Spotify, visualised
During the winter 2020 I started working on an Airflow 2.0 implementation running on Kubernetes, with DAGs running on ECS. There were a few reasons for doing this, other than the obvious which was that I was running out of things to keep me occupied.
- Get a deeper understanding of Kubernetes
- Get experience rolling out a greenfield Airflow implementation (something at the time I was expecting to have to do in the near future)
- Have a go with Airflow 2.0
At the same time, friends were posting their Spotify ‘Unwrapped’ insights. I really like the yearly digest of what I’d been listening to, but I had three problems with the service. Firstly, it encourages vendor lock-in, secondly it only goes back one year and thirdly, the data belongs to me but isn’t accessible by me.
I thought it might be nice to get hold of all of my data, so that I could keep it indefinitely and do something interesting with it. That unlocked a fourth opportunity to learn something I’d wanted to look at for a while: Apache Superset.
The Spotify work was broken into four elements:
- An Airflow DAG (in my newly built Airflow platform) which would capture the raw data relating to my play history from the Spotify API in near real-time.
- Another Airflow DAG which runs analysis on my listen history and generates new playlists automatically on my Spotify account. For example ‘Songs you liked but forgot about’, or ‘your top 100 songs of the year’.
- An interactive dashboard over the full history of data, giving me a super-charged version of the insights provided by Spotify Wrapped.
- A manual step, to request my full listen history from Spotify directly, using a developer account. Then supplementing my live data with this manually requested data.
The solution is massively over-engineered by design, as I wanted to get the architecture as close to production-ready as possible, to make it worth doing. Having said that, the whole thing was running at a cost of about £20 a month, most of which was down to the Digital Ocean Kubernetes hosting.
The dashboard itself was embedded into my old personal website and would allow anyone to filter my play history by date or search by artist.
I scaled the whole thing down after the initial interest subsided, then wrapped the whole thing up after about a year of it running because I was getting sick of people commenting on how much Genesis I’d been listening to.
Here is the embarrassing auto-generated playlist for 2021:
I also have one for the last decade, but it’s too shameful to post.