05 March 2018 – 23 March 2018
Data is everything and everywhere. How do we make sense of such ‘big data’? This project aims to equip students with the ability to visually encode data using procedural methods, exploring different graphing ‘types’, spatial and temporal mapping, and the use of scale, colour, and position to create meaningful, interpretative realisations of multi-dimensional information. Students may create a range of responses to this brief ranging from experimental visualisations to interpret and visualise data sets, to sculptural, audio or even performative responses. In summary, this project focuses closely on exploring methods of understanding narratives through the multimedia representation of data.
Research began somewhat similarly to research for Design Domain and Audio Visual, though this time I was able to fully indulge my adoration for data visualisation (pardon any switching between the British and American spelling). Below is a list of people, projects, studios, etc. that I’ve looked at and read from.
I went to Kaggle in search of a good, manageable dataset. I say manageable because the focus of this brief is interpretation and representation, not data wrangling. While the two are inextricably connected, I will clarify and say that the focus of this brief seemed to be more visually inclined rather than statistically driven. Having a background in data visualization from UW Informatics, I am going into this project with a background different than many of my peers. While our tutorials began with reading in CSVs and visualizing in Processing with a focus on tweaking aesthetics, I was hanging out in RStudio running some visual tests to refresh my memory. My focus will be visual, but the most important thing is the data and integrity in representation. My visual tests were rendering some simple plots with the built-in state dataset. Once I was back in the world of R, I resumed my dataset search.
Topic-wise, I was not here to be picky. While I have experience dealing in data with larger socio-political implications (see my Mapping Shootings 2016), I wanted a dataset that would balance quantity and quality. Ultimately, I’ve chosen to work with the Top Spotify Tracks of 2017.
I began my initial data exploration of the Spotify Top Tracks (STT) in R, and did a number of cursory visualizations using Plotly. Spotify reports metadata and metrics such as danceability, acousticness, valence, tempo, duration, and so forth, so I had plenty to work with. I was curious to know who the top artists were, what factors were the most varied and which were consistent, if there were any correlations between the metrics, and utilized things like histograms, bar charts, pie charts, tables, and heat maps. My main curiosity was something along the lines of, “What does the year of top streamed songs look like?” In terms of visualizing the 100 songs, I was thinking of things emblematic of music, as well as something that easily mapped to the values in a way that was simple, clear, and beautiful.
A number of the metrics are numbers between 0.0 and 1.0, and I thought of creating some sort of “fingerprint” for each song based on the numbers, much like I created fingerprints for each song in the Audio Visual project based on the amplitude and frequency. A literal fingerprint turned into concentric circles that would be completed based on the value of the metric. Very similar to the Apple Watch’s rings for calories burned, exercise minutes, and standing hours, the rings would be a simple representation of the value of the quality, and have four for each song. To do this, I opened Processing and wrote in real code the following pseudo-code:
- Read in CSV and parse correct values into variables
- For-loop for each row/each song
- Map the four values between 0.0–1.0 to 0 and 2PI
- Draw an arc the length of the mapped value for each metric
- Repeat with each song, incrementing x- and y-coordinates
- Output as PDF
My Processing sketch outputs this result as a pdf, which I opened in Illustrator and added a number of elements such as a title, a key, and a picture of the average song. The average song was achieved by using R to find the mean value of each column and writing that out to a new CSV to be read into Processing. As I worked on that, I began thinking about what this picture of the “average song” and all of the top tracks of 2017 meant. I was so curious about how this would have looked ten, twenty, and thirty years ago that I went looking for more data.
I happened upon the Billboard Hot 100s charts and the rda that has all of them with the Spotify metrics since 1960. Using R, I found the average for each year (each year has 100 songs, remember?) and wrote that out to a CSV to read into Processing and run in essentially the same sketch as the STT.
This week, I did a lot of fine-tuning. I selected exactly which metrics I would be using, cleaned up my code, finished the RMarkdown, and finished the static graphics. I did a bit of work in color study, as I wanted to try my best to use best practices for color contrast in data visualisations. I found a plethora of resources about selecting colors for different types of color perception, and eventually chose a red to purple tone for the top Spotify tracks and a blue to green tone for the Billboard Hot 100s. The colors swatch from the different contrast levels dependent upon how many different levels are needed, in this case, four. I suppose in the end, the choice to use red vs. blue was slightly arbitrary aside, i.e. there is no deeper meaning to the color, aside from that I wanted to have contrasting colors within the color schemes and contrasting color schemes so it was easy to tell that the two visualisations were separate but similar.
When resolving my final graphic outputs, I tried a number of different layouts. I settled on two full-sized poster outputs meant for print, one for Spotify’s Top Tracks of 2017 and one for Billboard Hot 100s from 1960. In addition to the appropriate concentric circles, I included the average song of 2017 for the Spotify one, and illustrated the change in songs from 1960 to 2015. I thought that in addition to presenting the information, it would be appropriate to include some insight visually as well. For the 2017 songs, with all 100 from the same year, it made sense to first show the average song. For information over time such as the Billboard Hot 100s, expressing the change in the average song over time was one of the best fits. As show in my deliverables, I also included the concentric circle outputs themselves in an aside. I chose not to label the songs because this visual is less focused on the hard data belonging to each individual song and more focused on the information as an entity.
I was waiting for this project all year, and I had so much fun with it. Not only did I get to pore over all kinds of data visualisation examples and dive deeper into it than I had thought I’d get to, I created something that I’m actually quite proud of. I think my affinity for data visualisation stems from my desire to create order and understanding in the world around me. I also love the feeling of being able to create what I want and have things align with my sense of taste (it’s a very recent thing for this to happen occasionally). That being said, I still have a long ways to go with where I want to be with data visualisation in general. After binging the content on The Pudding, I realized that creating data journalism pieces is more tangible than I thought. While I was able to create the RMarkdown in a way that summarised what I was looking at in the RStudio console and the static graphics altered from a Processing sketch output, I want to be able to combine those and make aesthetically similar work that is dynamic and interactive. I know where I am with my work now and I know where I want to be—and actually have a pretty good idea of how to get there. I’ll be looking into learning D3 and maybe some Python, too. Numbers are so fun to me, and data visualisation has definitely been my favorite project so far.