As a data journalist who creates graphics, there are a couple of graphics from different sources I want to highlight in particular.
For a archive of every static graphic I created, on my are.na channel Static Data Visualization, every static graphic I made for a newsletter or article is presented, along with a blurb.
Graphics are orginized by data of creation.
Bell Chart of Distance Between (Log) Median Impressions and Total ImpressionsSnapchat Political Ad Spending Shows How Groups in the United States and Beyond are Getting Creative (MediaFile)
For MediaFile, I recently wrote a article on Snapchat’s political ad spending dataset. I discovered looking at median ad impressions that Middle Eastern Countries (Turkey, the UAE) looked at Snapchat far more often then other countries. However, looking at total impressions shows that the United States is back to being one of the top countries in spending and impressions.
Diving into the dataset, I also discovered what these countries were doing on Snapchat. For Turkey, it was the Ak Parti, Edrogan’s Party doing political advertising. For the United Arab Emirates, I found the image associated with the campaign.
I had to take the log to look at the variance in the dataset because otherwise the outliers (the United States), made it difficult to note that by median, some countries looked at Snapchat ads more then the United States. However, overall, the United States had more impressions and spent more in total.
Hex Bin Map of Labour Strikes in ChinaThe Aggregate September 9th, 2019
For The Aggregate, a newsletter on the analysis of unusual datasets, I found a dataset on labour strikes in China due to following some protesters from Hong Kong.
I saw a tutorial using a hex bin to visualize geographic data and decided to do that to visualize the density of strikes, along with learning some more features to customize where and how the legend looks on the graphic.
TensorFlow Commit and Contributor DistributionThe Aggregate September 16th, 2019 | Senior Economic Thesis
For my senior thesis, I am writing about how open source software packages become popular. To do this, I used R and some functional data science tool to query the API instead of for loops.
To test the API, I grabbed data from TensorFlow and discovered like many other repositories on GitHub, most people don’t contribute often. In practice, a few people are a source of a majority of the commits.
BuzzFeed Textual AnalysisLink
For a pitch to The Pudding, I was inspired by their idea on headline complexity, and tackled it by using the BuzzFeed news API to extract nearly 50,000 BuzzFeed News articles and dump them into a SQLite database.
I visualized a sample of the dataset in D3.js with various methods, ranging from bar charts to the streamgraph above. Unfortunately, as the streamgraph is just a SVG on this page, you will have to go to the actual website linked above for the interactivity.
Journalism Job Losses from January to May 2019Industry Reminded Yet Again of Geographic Disparity (MediaFile)
For MediaFile back in May, I wrote a article on the layoffs that have been occurring for the past few months. What I did to collect the data is Business Insider had the article formated in a way that made it easy via rvest to scrape the page and get the company, the date, and how many people were laid off.
I then aggregated the statistics, grouping by month to see how many people were laid off per month.