Deconstructing Spotify's built-in HTTP server
Spotify has a useful feature in their desktop client that enables web pages to interact with the client. This includes starting/stopping playback, changing songs and reacting to current status of the player. The most prominent showcases are the Play button and the Spotify HTTP links to arbitrary songs.
By integrating a local HTTP server inside the desktop client, you can enable two-way communication between a web page and the player without the use of browser plugins. For obvious reasons, this is very beneficial.
Since we’ve ruled browser plugins as out of question the only remaining API is naturally HTTP. This also imposes the limitation that the browser must always be initiating the communication to the player (client -> server).
But before explaining the request flow, let’s quickly discuss two key parts of the solution: SSL and the web helper.
SSL and wildcard domain
A peculiar thing is that they chosen to exclusively send all traffic over HTTPS. I’m guessing the main reason for this is to fight pesky middleware (proxies, anti-viruses and personal firewalls) as data being sent isn’t sensitive.
With the requirement to use HTTPS, a valid SSL certificate is needed to avoid browsers complaining. Spotify has worked around this problem by registering a domain (*.spotilocal.com) that merely points to 127.0.0.1. But rather than connecting to the top domain, they use a wildcard domain and connect to a random subdomain each time (for example abcrjdknsa.spotilocal.com). The reason for this is to avoid the browser’s max connection limit per domain, enabling more tabs in the browser to concurrently use their API at the cost of an extra DNS lookup.
Spotify Web Helper
During the installation of the desktop client, Spotify installs a small headless executable that is set up to run at startup. This app is called the Spotify Web Helper and hosts the HTTP server. This is what the browser communicates with and I presume it acts like a HTTP -> IPC bridge for the player.
If no Web Helper is running when the player is launched the player process itself hosts the HTTP server. At exit, it launches the Web Helper again if not already running. Perhaps goes without saying but the web server only binds to localhost.
Request flow for playing a track
To better help illustrate the steps involved in getting this to work I’ll use a normal Spotify HTTP link. The high-level picture of what happens is this:
- User opens a Spotify link
- Web page tries to open Spotify client and start the song
- If successful, page continues to monitor the client and keep status in sync (is playing/paused, what track is playing etc)
To follow the steps yourself, open a link to a song in Spotify. Then fire up your Web Inspector and see how the requests progress.
1. Initialization
I will assume a page has been opened and the appropriate JavaScript files have been loaded. But given the basics just described, the JavaScript client starts off by sending a request to /service/version.json?service=remote to find out three things: server API version, desktop client version and a flag whether player is running. Since we in most cases are communicating with the Web Helper rather than the player itself, this flag can be false. In this case, the JS client issues another request to /open.json that fires up the Spotify player on the user’s desktop.
Port detection
Another important thing that happens during the initialization process is the detection of correct port to use. Since the port requested by the Web Helper can be taken by another process it need to rely on a few fallback ports. By default, it will try 4370 and work its way up to 4380 in hope of finding the port the Web Helper actually listens to.
This is accomplished by simple trial and error by increasing the port number after each XHR request, in hope of finding the built-in HTTP server. The port range itself fall within the ephemeral port range for many OS but not within the IANA port range, which confuses me a bit. Some other good reason for this perhaps?
2. Security and tokens
To ensure only trusted sites can communicate with the client, a few tricks are employed. First of all, the page must be from a trusted domain. In this case, open.spotify.com. On top of that, two types of tokens are required to communicate.
The first one is an OAuth token encoded as a base 64 (base 62?) string fetched from http://open.spotify.com/token. Retrieving it requires no special user data arguments but decoding it reveals what looks like a TTL of 8 hours(?). Cleverly enough, this URL exposes no JSONP nor CORS headers at all which makes it only accessible to pages on the same domain (namely Spotify controlled web pages). Try it with cURL and you’ll see.
The second token is an ordinary CSRF token which I assume is just there to prevent request forgery by other JavaScript (duh). However, to retrieve the CSRF token you must supply https://open.spotify.com as your Origin header (done behind the scenes by any modern browser with CORS), which yet again ensures only Spotify owned web pages can control the player. Smart. Subsequent requests mentioned below all use these two tokens when communicating with the player.
3. Controlling the player
With initialization and tokens out of the way, on to the real meat - actually playing a song. But before sending a request to play the song, the page issues a request to /remote/status.json to get the status of the player and ensure we aren’t already playing a song. Real shame to interrupt your music digging, right? A very nice feature indeed and I presume it’s one of the driving use-cases behind the built-in HTTP server. Their previous solution using URL handlers forcefully changed song as the web page had no way of knowing current status of the player (one way communication).
With no current song playing, the page asks the player to start our requested track. A request to /remote/play.json along with the track URI as a query parameter is all that is needed. The player obeys and you have your song playing, hooray!
After that, the page continuously long-polls the /remote/status.json URL to stay up-to-date with the status of the player. The information relayed includes what track currently playing, volume and if shuffle/repeat is enabled. For example, this is what keeps all play buttons in sync on opened pages (playing or paused).
Example client in Python
To better help my own understanding how all this works, I put together a small Python client that more or less replicates the behavior of the JavaScript client. Definitely a more naïve implementation but it should serve as a good example and starting point. You can find it on GitHub.
Documenting the details
While a local HTTP API isn’t unique to Spotify I was curious of how they did it since it seemed like a quality implementation. They haven’t published any information about their solution so I figured I might as well document my findings.
By looking at their implementation and speculating in the how-and-whys you can quite quickly suspect that stuff like this is riddled with pitfalls including browser compatibility, interfering firewalls and security. Hopefully others can learn something from this and not repeat any mistakes when implementing a similar solution.