Deconstructing Spotify's built-in HTTP server

Spotify has a useful feature in their desktop client that enables web pages to interact with the client. This includes starting/stopping playback, changing songs and reacting to current status of the player. The most prominent showcases are the Play button and the Spotify HTTP links to arbitrary songs.

By integrating a local HTTP server inside the desktop client, you can enable two-way communication between a web page and the player without the use of browser plugins. For obvious reasons, this is very beneficial.

Since we’ve ruled browser plugins as out of question the only remaining API is naturally HTTP. This also imposes the limitation that the browser must always be initiating the communication to the player (client -> server).

But before explaining the request flow, let’s quickly discuss two key parts of the solution: SSL and the web helper.

SSL and wildcard domain

A peculiar thing is that they chosen to exclusively send all traffic over HTTPS. I’m guessing the main reason for this is to fight pesky middleware (proxies, anti-viruses and personal firewalls) as data being sent isn’t sensitive.

With the requirement to use HTTPS, a valid SSL certificate is needed to avoid browsers complaining. Spotify has worked around this problem by registering a domain (*.spotilocal.com) that merely points to 127.0.0.1. But rather than connecting to the top domain, they use a wildcard domain and connect to a random subdomain each time (for example abcrjdknsa.spotilocal.com). The reason for this is to avoid the browser’s max connection limit per domain, enabling more tabs in the browser to concurrently use their API at the cost of an extra DNS lookup.

Spotify Web Helper

During the installation of the desktop client, Spotify installs a small headless executable that is set up to run at startup. This app is called the Spotify Web Helper and hosts the HTTP server. This is what the browser communicates with and I presume it acts like a HTTP -> IPC bridge for the player.

If no Web Helper is running when the player is launched the player process itself hosts the HTTP server. At exit, it launches the Web Helper again if not already running. Perhaps goes without saying but the web server only binds to localhost.

Request flow for playing a track

To better help illustrate the steps involved in getting this to work I’ll use a normal Spotify HTTP link. The high-level picture of what happens is this:

To follow the steps yourself, open a link to a song in Spotify. Then fire up your Web Inspector and see how the requests progress.

1. Initialization

I will assume a page has been opened and the appropriate JavaScript files have been loaded. But given the basics just described, the JavaScript client starts off by sending a request to /service/version.json?service=remote to find out three things: server API version, desktop client version and a flag whether player is running. Since we in most cases are communicating with the Web Helper rather than the player itself, this flag can be false. In this case, the JS client issues another request to /open.json that fires up the Spotify player on the user’s desktop.

Port detection

Another important thing that happens during the initialization process is the detection of correct port to use. Since the port requested by the Web Helper can be taken by another process it need to rely on a few fallback ports. By default, it will try 4370 and work its way up to 4380 in hope of finding the port the Web Helper actually listens to.

This is accomplished by simple trial and error by increasing the port number after each XHR request, in hope of finding the built-in HTTP server. The port range itself fall within the ephemeral port range for many OS but not within the IANA port range, which confuses me a bit. Some other good reason for this perhaps?

2. Security and tokens

To ensure only trusted sites can communicate with the client, a few tricks are employed. First of all, the page must be from a trusted domain. In this case, open.spotify.com. On top of that, two types of tokens are required to communicate.

The first one is an OAuth token encoded as a base 64 (base 62?) string fetched from http://open.spotify.com/token. Retrieving it requires no special user data arguments but decoding it reveals what looks like a TTL of 8 hours(?). Cleverly enough, this URL exposes no JSONP nor CORS headers at all which makes it only accessible to pages on the same domain (namely Spotify controlled web pages). Try it with cURL and you’ll see.

The second token is an ordinary CSRF token which I assume is just there to prevent request forgery by other JavaScript (duh). However, to retrieve the CSRF token you must supply https://open.spotify.com as your Origin header (done behind the scenes by any modern browser with CORS), which yet again ensures only Spotify owned web pages can control the player. Smart. Subsequent requests mentioned below all use these two tokens when communicating with the player.

3. Controlling the player

With initialization and tokens out of the way, on to the real meat - actually playing a song. But before sending a request to play the song, the page issues a request to /remote/status.json to get the status of the player and ensure we aren’t already playing a song. Real shame to interrupt your music digging, right? A very nice feature indeed and I presume it’s one of the driving use-cases behind the built-in HTTP server. Their previous solution using URL handlers forcefully changed song as the web page had no way of knowing current status of the player (one way communication).

With no current song playing, the page asks the player to start our requested track. A request to /remote/play.json along with the track URI as a query parameter is all that is needed. The player obeys and you have your song playing, hooray!

After that, the page continuously long-polls the /remote/status.json URL to stay up-to-date with the status of the player. The information relayed includes what track currently playing, volume and if shuffle/repeat is enabled. For example, this is what keeps all play buttons in sync on opened pages (playing or paused).

Example client in Python

To better help my own understanding how all this works, I put together a small Python client that more or less replicates the behavior of the JavaScript client. Definitely a more naïve implementation but it should serve as a good example and starting point. You can find it on GitHub.

Documenting the details

While a local HTTP API isn’t unique to Spotify I was curious of how they did it since it seemed like a quality implementation. They haven’t published any information about their solution so I figured I might as well document my findings.

By looking at their implementation and speculating in the how-and-whys you can quite quickly suspect that stuff like this is riddled with pitfalls including browser compatibility, interfering firewalls and security. Hopefully others can learn something from this and not repeat any mistakes when implementing a similar solution.


Modern process management for web apps

Part of good systems administration is managing processes. Not that it is a fancy job but rather a job that is necessary to deliver a quality web app. The current state of affairs when it comes to this is unfortunately either outdated or very crude. At the same time, this also makes it easy to improve.

Process management can do much better by doing two things: make your app do less and use of better administration tools. Perhaps that comes off as obvious but I want to share a few thoughts and ideas on how to do better. Applying this to a general purpose application is certainly possible, but this post was written foremost with web apps in mind.

Simplify Expectations

A good way to help you with the task of managing the process(es) of your web app is first and foremost to simplify the expectations of your web app. By that I mean the expectations you have on how your app should behave while running in production. As you’ll see below, it is a lot about doing less when it comes sysadmin side of things of your web app.

Daemonization

Don’t do daemonization from within your app as it will effectively free your sysadmin from any control what so ever of your process and any of it’s child processes, thus making process management near impossible. Running your app as a foreground process is an an enabler for the bullet points below.

PID files

They are a more or less the de facto way of keeping track of running processes on Unix systems. Despite that, they have a tendency of becoming stale or outdated when processes crash unexpectedly and/or fail to clean up after themselves. Avoiding PID files also avoid extra configuration on where to store them. Proper process supervision is a better solution, more about that below.

Forking

The most common use-case of forking web applications is to take advantage of a multi-core system. However when forking, it also becomes the job of your app to keep track of these child processes. That’s something your sysadmin is really better at and probably want to do. When almost every deployment of your web app sits behind a firewall or reverse proxy, getting it to load balance between your processes should pose no problem.

Logging

When logging, send all your log output to stdout. Use whatever logging library you need but don’t bother with a log file or even worse, multiple log files. Overall, log files are a common headache for sysadmins that require special care not to fill up the disk. Simply not doing file based logging free you from doing in-app log rotation and offering configurable log file paths. But for a sysadmin, the greatest benefit is that they can now redirect your logging output wherever he/she pleases: to a log rotation script, logger(1) or /dev/null :).

Adhering to these simple guidelines you will surely make life easier for your sysadmin looking at deploying your web app. And to be fair, that sysadmin is quite often you.

These guidelines are mentioned in the 12 factor app, a document describing characteristics of a modern web app. If these guidelines for processes makes sense to you, explore their site for more inspiration on similar topics.

Tools

Reading that list of guidelines can make it seem that the sysadmin will have one pesky day setting up and running your application. But in reality, this is less the case with good tooling.

To begin with, there’s good old SysV init. Probably the most common (and oldest) process manager you find on Unix systems. But given it’s age, it is also really arcane. A lot of things have happened and improved since then. Managing processes the way your father did is necessarily not the best thing to do. And let’s face it, writing rc.d scripts is not fun, not even the slightest as you need to take care of everything - daemonization, writing PID files and rotating log files and so on. On top of that, you can’t even be notified when a process dies! Truly arcane! Given these hurdles I know that at least I have been guilty of firing up screen/tmux and spawning the processes by hand. A very short-term solution to a nagging problem. Ugh.

Luckily, there are many popular alternatives to init for this task. Upstart, monit, runit to name a few. Supervisor and god are two tools worth an extra mention since they are written in Python and Ruby respectively. Something that web hackers do appreciate as it will make things easier to comprehend and extend. By combining one of these tools with the guidelines above will yield a great combo as they will supplement each other.

Be Modern

So to avoid screen/tmux sessions with running applications and arcane SysV init scripts, do take a look and investigate these tools. Dustin Sallings has a good write-up on modern tools for process management. I suggest you give that a look and then promise yourself to do modern process management.


The Scala Paradox Is Permanent

Paul Graham wrote an essay years ago about the Python Paradox. Explaining how choosing a more exotic language will make it easier for you to find highly skilled and competent developers. This caused a bit of stir in the community but the message here isn’t that other developers are dumb. It’s that Python developers (back then) were attractive to hire because they know Python not because they have to, but because they want to. Developers pursuing such knowledge clearly distinguish themselves from other “run-of-the-mill developers”.

Now lets fast forward to 2011. Scala is in a similar position Python used to be. Not quite mainstream but highly productive under the right use-cases and yes, exotic! With these similarities in mind perhaps it’s a no-brainer be calling Scala subject under the Python Paradox.

Sharing the demographics with early Python, Scala could be considered as uprising and on it’s way of making it mainstream. Also, temptingly it must be considered a moving target that in due time will become so widespread that it no longer can fall under mentioned paradox. But in here lies the essence; I don’t think this will happen. Scala’s paradox isn’t temporary!

Permanent

Reading David Pollak’s recent article on why Scala is hard lead me thinking that perhaps Scala’s way of making it mainstream, as a Java replacement, isn’t as straightforward as the community used to think. The point David is making is that Scala is hard - for the average developer. You can of course get into the discussion whether this is a good thing or not. This time however, I will refrain from that.

But one thing can be considered positive and it’s the chance of the Scala Paradox becoming permanent. A chance that it always will be a language with a high entry threshold and almost exclusively attracting very skilled developers. Because as David pointed out:

If you know Scala, you’re most likely a very competent developer.

Exclusively better?

Contrary enough, maybe it is actually in the best of interest of the Scala community to keep things this way. A paradox that’s not based on a moving target could be of use for both employers and employees looking for a job.

The current adoption and use of Scala is very much where it can (and perhaps will) stay. Scala is not a language desperately looking for more followers. It has become widespread and stable enough to be accepted and trusted by many organizations. In spirit of the name, trying to uphold the Scala Paradox is perhaps the right thing for the community to do.


Six Reasons Why node.js Is So Popular

Node.js is an interesting new phenomenon to hit the web development scene lately. Many things can be said why it has become so popular. Below is a small list compiled of different aspects with node.js that, at least I, believe has contributed to it’s success.

1. It’s fast

Being built upon on the V8 VM found in Google Chrome, node.js offers extreme performance compared to similar VMs used in Ruby or Python. In benchmarks, V8 is outperforming CPython by a factor of ten. Such microbenchmarks might not always tell the whole story but it’s a clear indicator of how fast V8 is. With differences like this, performance becomes a feature!

2. JavaScript all the way, baby

JavaScript has truly become the ubiquitous language of the web. Everyone doing web development (with or without node.js) knows JavaScript. This not only helps drive adoption but also makes it easier for code re-use between front-end and back-end.

But a more important point is that this helps reducing the cognitive load that a web developer has today. Reducing the number of languages and [unifying parts of your toolkit] does a lot for your productivity. Even databases such as CouchDB and MongoDB are using JavaScript for controlling behavior. Being so widespread, it’s possible to use JavaScript through all three commonly found layers: browser, server and database.

3. Web centric

Being based on JavaScript and V8, node.js has naturally come to attract mainly web developers. In spirit of the Python Paradox, it has also come attract very talented developers. Many responsible for interesting innovations within the web development community lately. Two examples are socket.io and SocketStream.

A common opinion is that Ruby (with Rails) used to be the go-to language and platform for web dev hipsters but this seems to be shifting towards node.js. This is also reflected in the number of node.js libraries available as they tend so solve very web specific problems.

4. Active development

node.js is a young project. It hasn’t even reached 1.0 yet and APIs are continuously being stabilized. This has of course both benefits and drawbacks. But with the fast moving web development community, this has the potential of being favorable to those people as features and decisions can be made quickly.

Node.js is currently the third most watched repository on GitHub making it a very popular project. The only two more popular projects are jQuery and Rails. Another interesting aspect with node.js is that it uses an existing VM and an existing language (all node.js really does is providing an async I/O layer on top of existing software).

5. Always asynchronous

All I/O done in Node is by design asynchronous. This is a very deliberate decision made by node.js’ author as it’s a very opinionated platform. JavaScript has a long history of being used in event-driven environments (read: browsers) because it’s syntax, closures in particular, fits this development paradigm well.

Being 100% asynchronous with only ONE true way of doing I/O also brings other benefits, as seen in next section.

6. Non-fragmented community

Perhaps the biggest reason why node.js and it’s event-driven style of programming has become so popular is the fact there’s only ONE way of doing I/O. This has tremendous implications on the community as it can be centered around one I/O API.

This very fact is also the reason why evented programming never has taken off in other similar languages such as Python and Ruby. While they both offer many event-driven and asynchronous I/O frameworks they all suffer from one thing: community fragmentation.

This is devastating to adoption since every framework found in these languages form their own communities with libraries not compatible with vanilla I/O shipped with the platform. While great implementations, popular examples are Event Machine, Twisted and gevent. Although gevent tries to alleviate the problem slightly by monkey patching existing I/O functions.

With node.js approach of being ONLY event-driven right from the start it avoids this problem completely.


See the articles page for more articles.