Last year, the data visualization website The Pudding mapped out Miles Davis’ legacy by sorting through every Wikipedia page that mentioned the jazz legend. It was just one of dozens of visualizations that have been built on Wikipedia data over the past decade.
This week, The Pudding released a new visualization that examines Wikipedia pageview increases after prominent celebrity deaths, and what those pageview spikes might mean or say about the prominence of celebrities in popular culture. (We make all of our pageview data public, so if you’re curious about other topics, you can search for them using the Pageviews tool.)
We reached out to The Pudding’s Russell Goldenberg to learn more about his latest visualization and how he used Wikipedia’s API to get the data.
How do you come up with new ideas for data visualizations?
Ideas for new stories and visualizations come in a variety of flavors. We have a loose process that we constantly iterate on. Our team holds weekly meetings dubbed “Story Time” where we pitch anything from fully-baked to half-baked ideas. It includes a roundtable discussion of questions and reactions to gauge the group’s interest and see if there is something worth pursuing. We’ve had the most success by starting with a simple question then attempting to answer that through data analysis and visualization. Occasionally we will start with a data set and see if we can find a story, but we’ve found that doesn’t yield interesting results as often that align with our rubric.
How have you used Wikipedia datasets for your visualizations?
Wikipedia is awesome because it so versatile. For example, our story on the legacy of Miles Davis used Wikipedia as a primary source by looking at all Wikipedia articles that mentioned him. It is also terrific for secondary info: things like short bios of people, or in the case of our story about the craft beer capital of the US, the populations of the 1,000 biggest cities to help with data processing and filtering. We also have another story in the works right now since we got a bunch of ideas after playing around with the pageviews tool.
It’s incredibly fascinating that Wikipedia could be used as a proxy for cultural relevance of events or people. I’m curious what you learned as you put together your most recent feature about celebrity deaths.
It is really neat to see a quantitative measure that shows our natural instinct as humans in the age of the internet. That is, when something big happens, tragedy or otherwise, we Google people. Whether it is to learn to details about what transpired, or maybe learn more about the person in general, we are constantly hungry for those details. I was most surprised about the magnitude of traffic after a death and how quickly it dissipates, something I thought would have a longer tail. It was a challenge even putting that surge in traffic in perspective since it was off the charts in most cases.
What data sets do you wish were available?
It would be great to see where traffic comes from, both externally and internally. How people bounce between articles and the paths they take could lead to lots of interesting stories. (Editor’s note: We release clickstream data on a monthly basis.)
Anything else you’d like to add?
Shout out to the node.js wrapper wikijs to help query the Wikipedia API. If anyone ever has interesting ideas for stories they’d liked to tell or ideas for us to tackle with or without Wikipedia data, we would love to hear them. Reach out to sup@pudding.cool.
Melody Kramer, Senior Audience Development Manager, Communications
Wikimedia Foundation