My Listening Tracker

Loading…

Top by play count

Show

Range Limit

How it's built

A personal listening history, built end to end.

Spotify gives you Wrapped once a year and forgets the rest. This tool keeps a running history of every song I play and gives me the tools to interrogate it. Top artists by year. Genre drift. Decade mix. When I first discovered an artist. The works.

Everything you're looking at runs on free-tier infrastructure, pulls from a bulk Spotify export plus two live APIs, and ships as a single static webpage. Below is the short version of how it works and what it took to build. Plain English first. Technical details after.

How it works, in plain English

If you've never built a web app before, here's the entire pipeline in six steps. No jargon required.

1. You press play on Spotify

Spotify logs the play in your account history. That's it for now. Nothing to do with this site yet.

2. A small program wakes up every hour

I wrote a program that lives on a free hosting service. Every hour, it wakes up and asks Spotify "what has Evan played recently?" Then it asks a second service (Last.fm) the same question, in case Spotify dropped any.

3. It writes each play to a database

The program stores every play in a small database. One row per play, with the song name, artist, album, and the exact timestamp. The years of history from before launch came from a one-time data export Spotify will hand over on request. Everything since launch is the hourly fetch. The dataset is roughly 30 MB today and growing slowly.

4. You open this page

Your browser asks the database questions like "top 25 tracks of 2024" or "which artists did I first discover in March?" The database answers, and the page draws the tables and charts you see.

5. Every tab is a different question

Top. Genres. Decades. Trends. Discovery. Recent. Same dataset, asked different ways. Adding a new tab is mostly writing a new SQL query and a small chunk of HTML.

6. The whole thing is automatic

Once it's set up, I don't have to touch it. The hourly fetch runs in the background. The database grows on its own. If I built nothing else for a year, the dataset would still be 8,760 hours richer.

Architecture (the technical view)

Spotify bulk export ─┐ (one-time) Spotify Web API ─────┼─► Cloudflare Worker ─► Cloudflare D1 Last.fm API ─────────┘ (cron, hourly) (SQLite, ~30MB) │ ▼ Static HTML + JS frontend (Chart.js, no framework)

Ingestion: an hourly cron job (a scheduled task that fires automatically every hour) polls Spotify's /recently-played, snapshots every playlist I follow, and pulls Last.fm scrobbles to fill the 3-5% of plays Spotify drops. Each source writes to ingestion_log so failures are diagnosable.
Storage: normalized SQLite schema with streams, tracks, artists, albums, playlists, playlist_snapshots, plus reference tables. Foreign keys, indexes that actually get used, and a deliberate sparse-row pattern so partial data doesn't block ingest.
Frontend: one static HTML file. Nine tabs hit a handful of JSON endpoints. No build step, no framework, no node_modules shipped to the browser. Loads in under 100ms.

Three data sources, one timeline

The dataset is three different sources stitched into a single normalized timeline. Each row in the database tags which source it came from, so I can audit coverage at any moment.

One-time · historical

1. Spotify data export

Spotify will hand over your full listening history if you request it through their privacy portal. The "extended" export arrives weeks later as a stack of JSON files. This is the back-catalog. Every play from when I started using Spotify, including podcasts.

Why it matters: the API only returns the last 50 plays. Without the export, the dataset would start at launch day and have nothing from the years before.

Hourly · live

2. Spotify Web API

Every hour, the worker hits /recently-played for the latest plays and /me/playlists for current snapshots of every playlist I follow. Authentication runs through the Spotify Developer dashboard (where I registered the app) using OAuth 2.0.

Why it matters: the ongoing record from launch forward, plus rich metadata (artist images, popularity, genres) the bulk export doesn't include.

Hourly · gap-fill

3. Last.fm API

Spotify's /recently-played caps at 50 plays per call. If I play more than 50 songs between polls, the surplus would vanish. Last.fm has been "scrobbling" (their term for logging a play) everything I listen to for years. The worker pulls those scrobbles and inserts only the ones Spotify missed.

Why it matters: nothing falls through the cracks if I go on a 4-hour playlist binge between polls.

Every stream row tags its source as export, api, or lastfm. The dedup logic (the ±5-minute window described below) keeps the three sources from double-counting the same play. Combining a one-time historical import with two live streaming feeds is a classic ETL pattern (ETL stands for Extract, Transform, Load: the standard recipe for getting data out of one system, cleaning it up, and putting it into another). It's the difference between a dataset that goes back years and one that starts when I turned the tap on.

Stack at a glance (and what each tool actually is)

Quick teacher's tour through the stack. Each card has the technical name on top and the "what is that?" explanation underneath, with extra attention on the tools that have weird names.

Runtime

Cloudflare Workers (TypeScript, ES2022). Edge-deployed, free tier, native cron triggers, no servers to manage.

What this is: Cloudflare Workers is a "serverless" hosting service. I upload a small JavaScript program and Cloudflare runs it for me, with no servers to set up or maintain. "Edge-deployed" means it runs in 300+ data centers worldwide, so it's fast no matter where you load the page from. "Cron" (short for the old Unix scheduler) is the standard way to say "run this program automatically on a schedule." Think of it as a recurring calendar event for code: mine fires every hour at the top of the hour, every hour, forever, without me lifting a finger.

Database

Cloudflare D1 (SQLite). Same vendor, free tier, real SQL. No NoSQL gymnastics on top-N queries.

What this is: D1 is Cloudflare's database service, built on top of SQLite. SQLite is the database engine that quietly runs inside iPhones, Android phones, web browsers, and most desktop apps. It's everywhere, and it's been battle-tested for 20+ years. "NoSQL" is a category of newer databases that don't use SQL. They're great for some use cases, but they turn simple analytical questions (like "top 25 tracks of 2024") into complicated programming exercises. For a project like this one, plain SQL is the right tool.

Frontend

Vanilla JS + Chart.js. No React, no build pipeline. Sortable tables, modals, and charts in around 1,600 lines of HTML.

What this is: "Vanilla JS" is developer slang for "plain JavaScript with no add-ons," kind of like vanilla ice cream versus a milkshake with toppings. Most modern web apps use frameworks like React, Vue, or Angular that add structure and reusable parts, but also significant overhead. This site skips all of that. The entire interactive page is one HTML file plus Chart.js (a small library that draws the charts). It loads in under 100 milliseconds because there's almost nothing to load. Choosing vanilla here was deliberate: a static HTML file is faster, simpler to debug, and any web developer can read it a year from now without needing to learn the framework first.

Auth

Spotify OAuth 2.0 (Authorization Code flow). Refresh token persisted in D1; access tokens minted per-invocation, held in memory only.

What this is: OAuth 2.0 is the standard way one app gets permission to act on your behalf in another app, without ever seeing your password. It's what's happening when you click "Sign in with Google" or "Connect to Spotify" on a third-party site. I authorized my program once. Spotify handed it a refreshable "token" (a long random string that proves the permission). The token lives in the database. My actual Spotify password is never involved, and I can revoke the token from Spotify's settings at any time.

Data sources

Spotify bulk export (historical JSON, requested via Spotify's privacy portal), Spotify Web API (hourly poll, via the Spotify Developer dashboard), and Last.fm REST API (gap-fill).

What this is: a one-time bulk download for the years of history, plus two live services that keep the dataset growing. "JSON" is the standard text format computers use to exchange structured data. It looks like a tidy outline made of brackets and labels. The "Spotify Developer dashboard" is a website Spotify provides where any developer can register an app and get the credentials needed to call the Spotify API. The Spotify Web API I'm using is the same one their mobile app uses behind the scenes.

Tooling

Wrangler for deploys and D1 migrations. TypeScript strict mode. Observability via Cloudflare's built-in metrics and custom diagnostic endpoints.

What this is: Wrangler is Cloudflare's command-line tool. I type "wrangler deploy" in my terminal and seconds later the updated code is live in production. "TypeScript" is JavaScript with a type-checker added. Plain JavaScript happily lets you write nonsense like "add the number 5 to the string 'hello'" and only complains at runtime, when a user is already broken. TypeScript catches that kind of mistake before the code even runs. "Strict mode" turns the checker all the way up so it catches the maximum number of issues. "Observability" is the umbrella word for ways to see what your running code is actually doing in production. Cloudflare provides some of that automatically (request counts, error rates), and I built my own diagnostic pages for the rest (auth status, ingestion logs, last successful poll).

Five problems that were genuinely interesting to solve

1. Spotify's API lockdown

A February 2026 policy change killed Spotify's batch endpoints (URLs where you can ask for many items in one request, marked by ?ids=...) for dev-mode apps overnight. Enrichment had to be rewritten to fetch one item at a time, with a small cap on how many can run in parallel. I built a /diag/spotify probe (a small admin page that hits 5 endpoints with one fresh token and dumps each status code) so the next time something silently changes, I know within 30 seconds which endpoint is affected, not which subsystem.

The takeaway: when a platform changes the rules under you, you adapt fast. Then you instrument your code so you'll see the next change coming before it hurts.

2. The 50-subrequest budget

Cloudflare Workers free tier caps each program run at 50 outbound fetches (every time my code calls out to Spotify or Last.fm counts as one). Listing 114 playlists at 1-3 fetches each blows through that fast. Fix: cap fetches per cron tick, commit progress to the database, let the next hourly run pick up the rest. The initial backfill took 12 hours. Steady state has plenty of headroom.

The takeaway: when you can't do all the work in one shot, you do it in pieces and remember where you left off. Patience and bookkeeping beat brute force.

3. The Last.fm dedup window

Both APIs log the same plays. Last.fm scrobbles at around 50% playback while Spotify timestamps near song-end, and the two services often pick different Spotify track IDs for the same song. My first attempt (±60s, match on track_id) left phantom duplicates everywhere. The fix: ±5-minute window, match on (track_name, artist_name) text. I found the right values by graphing the deltas on a sample. SQL didn't solve this. Eyeballing a chart did.

The takeaway: real data is messy. When two sources record the same event differently, the work is in defining what counts as "the same" and being honest about how loose that tolerance has to be.

4. A 21-hour API ban

An every-minute cron racing a manual loop hit Spotify's burst limit and earned a Retry-After: 75160 response (a standard HTTP header that tells the caller "don't try again for this many seconds"). That's a 20.9-hour cooldown. Lesson learned the expensive way: read the error envelope before retrying, and serialize bursts on suspicious endpoints. I saved it as a durable memory note so future-me doesn't repeat it.

The takeaway: every API has hidden cliffs. You can read about them in the docs, or you can find them with your face. I found this one with my face, then wrote it down so I never do it again.

5. A 1-byte BOM in a secret

Last.fm rejected my API key with a generic 403. Cause: PowerShell's UTF-8 encoding was silently adding a Byte Order Mark, or BOM (an invisible marker character that some Windows text tools insert at the start of a file), when I piped the API key to wrangler secret put. I built a diagnostic that returned the stored secret's (length, head, tail) without leaking it. The output showed length=33 instead of 32. Switching to Git Bash's echo -n command (Git Bash is a Unix-style terminal on Windows, and it doesn't add the BOM) fixed it. Encoding bugs are the ones that teach you the most.

The takeaway: the bug is rarely where you think it is. Build the tool that shows you the actual state of the world, not the state you assumed.

Transferable skills demonstrated

Data

Relational schema design + migrations
Deduplication strategy across noisy sources
SQL: CTEs, window functions, aggregations
Time-series modeling (decades, trends, discovery)
Data quality monitoring via ingestion logs

Backend

TypeScript (strict mode)
REST API integration + OAuth 2.0
Cron design + idempotent ingestion
Batching + rate-limit handling
Secret management on Cloudflare

Frontend

Vanilla JS, no framework
Chart.js for visualizations
Responsive CSS (mobile-first)
Tab routing + modal patterns
Client-side sort + filter

DevOps

Cloudflare Workers + D1
Wrangler-based migration discipline
Custom diagnostic endpoints
Structured incident notes
Free-tier cost engineering

Engineering judgment

Reading docs + changelogs carefully
Knowing when to wait vs. retry
Writing memory notes for future-me
Schemas that survive future questions
Avoiding frameworks when 200 lines of vanilla JS will do

What this says about how I work

Read the docs before writing code. Then prove behavior with diagnostic endpoints when reality doesn't match.
Track production incidents as durable notes. Commit messages forget. The burst-cap incident is a paragraph I re-read every time I touch a rate-limited API.
Stay on free tiers until there's a real reason to leave. Cost awareness is part of engineering.
Prefer schemas that survive future questions over schemas optimized for today's queries. Adding a Genres tab six months later was a one-day job because the data was already shaped right.
Skip the framework when vanilla will do. A static HTML file with sortable tables, modals, and Chart.js loads faster than a React app, and any engineer can read it a year later without context.

If you'd like to build something similar

The transferable moves: (1) ingest from APIs you don't own using a free serverless cron, (2) normalize into SQL where SELECT is your analysis language, and (3) write a tiny static frontend that queries simple JSON endpoints. These patterns apply to any API-driven dataset. Banking transactions. Fitness data. Calendar history. IoT readings. The interesting work is rarely the code itself. It's in the schema choices, the dedup logic, and the diagnostic instrumentation you build along the way. The code is easy. The judgment is the part that takes practice.

How 48 hours was possible

Idea to live in a 48-hour window. Realistically, that was 8 to 12 hours of actual hands-on work. The rest was sleep, day-job hours, and waiting out a couple of API cooldowns I'd earned the hard way.

I worked alongside Claude (Anthropic's AI) the whole way. The early thinking happened in Claude chat (the standard web chat at claude.ai), sketching schemas and exploring how Spotify's API actually behaves. The tricky design conversations ran through Claude Cowork (a collaboration mode for longer working sessions with shared context across turns): the dedup window math, what to do during the 21-hour API ban, whether to ship Genres before or after Decades. The actual code, debugging, and deploys happened in Claude Code (Claude running in my terminal, able to read and edit files on my computer directly), including the page you're reading right now.

Every architecture call is mine. Every data-modeling decision is mine. Every judgment about what to ship and what to skip is mine. Claude did the typing and the look-it-up-in-the-docs work. Knowing how to direct AI tools well is fast becoming a real engineering skill. It's about knowing when to push back and when to verify, which is exactly what an engineer does with a teammate. I'd rather be honest about that than perform a story where I built it alone.

Built by Evan Burgei in a 48-hour window (8 to 12 hours of actual work), with Claude as a collaborator. A data analyst who likes when the data answers the question. If this looks like the kind of thinking you'd want on your team, let's talk.