Spotify recently released some cool information about the most streamed songs and artists of 2018. This included a playlist of the top 100 songs on Spotify during 2018, which was the basis of this visualization! In this blog post, I’ll point out some things I thought were interesting and then give a high level overview of some of the technical details of building the vis.

To fetch the data for this visualization, I used spotipy, a Python client for the Spotify API, to get all the songs within Spotify’s Top Tracks of 2018 playlist and to get the audio features for all the songs. Audio features are various features about the song, such as the tempo, loudness, energy, and so on, that you can get through the Spotify API – it’s definitely one of the coolest things I’ve found while working on this visualization. I’ve included a bunch of these features in the vis, and their descriptions (from the Spotify API docs) are listed below.

  1. Energy - “Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.”
  2. Danceability - “Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.”
  3. Valence - “A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).”
  4. Speechiness - “Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.”
  5. Acousticness - “A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.”

Some Observations

Most artists that made the top 100 streamed songs only have one song in the playlist. There are three artists that stand out for having an unusually high number of songs within the top 100: Drake, XXXTENTACION, and Post Malone. This makes a ton of sense because according to the Spotify Year In Review post, these three artists were also Spotify’s top streamed male artists of 2018. They also all came out with new albums this year, and their songs in the top 100 mostly were from these new albums. It’s kind of interesting here that the three most streamed male artists all stand out with 4+ songs in the top 100, while the most streamed female artist (Ariana Grande) has just 2 songs in the top 100 (which to be fair, is still very impressive). The Spotify news release about the top streams actually cites that Drake (the most streamed male artist) had more than 8.2 billion streams in 2018 compared to over 3 billion for Ariana Grande, so it might be due to the difference in number of total streams, which is pretty huge.

In the Spotify docs on audio features, there are some histograms for distibutions of audio features. For the most part, the distributions of energy, danceability, valence, speechiness, and acousticness appear to be fairly similar to the distributions pictured in the docs. However, the danceability distribution is a little more skewed towards high danceability values. Also, the energy one has almost no values under 0.3, whereas the energy distribution of all songs has a substanial number of low energy (under 0.3) songs. This makes sense, since I’d expect for top hits of the year to skew towards catchy, high energy tunes, which seems to be the case here. Actually, I was expecting this to be more skewed than it is.

Building the Visualization with D3 Force Directed Graph

To build this visualization, I used d3-force, which is the D3 implementation of a force directed graph layout. This simulator works based on input nodes and forces.

D3 is all about binding data to DOM elements. In this case, our data are the 100 songs, which are represented as an array of objects, each with attributes about each song, such as the name, the artist, danceability, energy, etc. The circle SVG elements displayed are bound to this array of data objects. We also have our force simulator, which treats this array of data as the nodes for the simulation.

Now that we have our nodes, we have to specify the forces, which dictate how the nodes are positioned. There are two different ways we’d want our nodes positioned:

  1. For categorical attributes, like “artist”, we want every node with the same attribute to be grouped together in clusters.
  2. For quantitative attributes, like “danceability”, we want the nodes to be positioned based on a linear scale but not to overlap the other nodes. This is called a beeswarm plot (I learned that while making this visualization).

So, for the first case, what we basically want to do is define centers to cluster points around. That is, we define some coordinates for the “Drake” cluster of songs and some other coordiantes for the “Ariana Grande” cluster of songs and so on. So how do we define these coordinates? D3 conveniently has some other layouts that will kind of do this for you (it’s slightly complicated but I’ll spare the details here). Here, I use the treemap and radial packing layouts to get appropriate cluster centers based on the selected attribute (artist, album, year, etc). There are also other ways of figuring out cluster centers. For example, if you know for sure you’ll have three clusters, you could just divide the screen width into thirds. However, using the treemap or radial packing layouts is more robust because they take into account the number of clusters and the size of each cluster to come up with the best positioning. This means the person actually writing the code doesn’t have to figure out any of that beforehand.

Then, we’ll want to add forces to encourage all of the Ariana Grande nodes to gravitate toward the Ariana Grande center and the Drake nodes to go towards the Drake center and so on. To do this, we assign X and Y direction forces to attract particles towards the corresponding coordinates of their cluster center.

However, we also don’t want the nodes or clusters of nodes to overlap with each other. To get around this, we define another force, d3.forceManyBody() which applies a charge to all nodes. In this case, we input a negative value to simulate a negative, repulsive force.

These are the lines of code for adding the forces to the simulation.

simulation.force('charge', d3.forceManyBody().strength(charge))
          .force('x', d3.forceX().strength(FORCE_STRENGTH).x((d) => nodePos(d, centers, key).x))
          .force('y', d3.forceY().strength(FORCE_STRENGTH).y((d) => nodePos(d, centers, key).y));

Handling the second case (plotting quantitative data with the beeswarm plot) is fairly similar – we just have to tweak our forces. We still have an X force because we want each node’s x position to follow the attribute’s value. That is, a node with a danceability of 0.7 should fall at 0.7 on the scale. We also have a Y force, which now attracts all nodes to the same Y coordinate since we want all nodes to be roughly positioned in a horizontal line. Finally, we still don’t want nodes to overlap with each other. However, instead of using d3.forceManyBody(), which is generally used for more “natural effects” like simulating gravity, I’m opting to use the collision force instead. This works by explicilty passing in the radius of the circles, and the simulator will prevent tge nodes from overlapping. This is summed up in these couple of lines:

simulation.force("charge", null)
          .force('x', d3.forceX().strength(FORCE_STRENGTH).x((d) => scale(d[key])))
          .force('y', d3.forceY().strength(FORCE_STRENGTH).y( HEIGHT / 2 ))
          .force("collide", d3.forceCollide(RADIUS));

So the visualization is initialized by creating all of the nodes/circles. Then, any time a button for artist, album, energy, etc. is clicked on, we calculate cluster centers if it’s a categorical attribute, reset the forces, and restart the simulation, which repositions everything! The important thing to note here is that we have to explicitly restart the simulation after updating forces, which is done with this line:


Want to know more?

Spotify data:

This post covered a pretty high level overview of D3’s force simulation. Here are some other resources/code snippets! I read a lot of these while building this visualization.

I actually used D3 v5 for this, even though the examples I’ve listed are in v4. All of the v4 examples should pretty much work in v5, with the exception of reading files, since D3 v5 uses Promises for loading data!