As a data journalist who creates graphics, there are a couple of graphics from different sources I want to highlight in particular. Many of these are either are from news outlets, for school, or from the newsletter I run, The Aggregate.
Graphics are organized by data of creation.
Bell Chart of Distance Between (Log) Median Impressions and Total ImpressionsSnapchat Political Ad Spending Shows How Groups in the United States and Beyond are Getting Creative (MediaFile)
For MediaFile, I recently wrote a article on Snapchat’s political ad spending dataset. I discovered looking at median ad impressions that Middle Eastern Countries (Turkey, the UAE) looked at Snapchat far more often then other countries. However, looking at total impressions shows that the United States is back to being one of the top countries in spending and impressions.
Diving into the dataset, I also discovered what these countries were doing on Snapchat. For Turkey, it was the Ak Parti, Edrogan’s Party doing political advertising. For the United Arab Emirates, I found the image associated with the campaign.
I had to take the log to look at the variance in the dataset because otherwise the outliers (the United States), made it difficult to note that by median, some countries looked at Snapchat ads more then the United States. However, overall, the United States had more impressions and spent more in total.
Hex Bin Map of Labour Strikes in ChinaThe Aggregate September 9th, 2019
For The Aggregate, a newsletter on the analysis of unusual datasets, I found a dataset on labour strikes in China due to following some protesters from Hong Kong.
I saw a tutorial using a hex bin to visualize geographic data and decided to do that to visualize the density of strikes, along with learning some more features to customize where and how the legend looks on the graphic.
TensorFlow Commit and Contributor DistributionThe Aggregate September 16th, 2019 | Senior Economic Thesis
For my senior thesis, I am writing about how open source software packages become popular. To do this, I used R and some functional data science tool to query the API instead of for loops.
To test the API, I grabbed data from TensorFlow and discovered like many other repositories on GitHub, most people don’t contribute often. In practice, a few people are a source of a majority of the commits.
Venkatesh Rao's Threadapalooza AnalysisThe Aggregate December 23rd, 2019
Venkatesh Rao, the runner of famous internet blog Ribbonfarm, featured my work where I analyzed a massive tweetstorm by hundreds of people that he instaigted, dubbed the Threadapalooza.
To grab the data, Artem Litvinovich provided JSON files of the tweets. Via the usage of the map function and some json cleaning in R, I was able to clean the data into a workable format. What this graphic is analyzing is comparing the most popular threads by total tweets, versus the tweet to like ratio, with the latter representing the chance of a short thread that was liked by many people.
Journalism Job Losses from January to May 2019Industry Reminded Yet Again of Geographic Disparity (MediaFile)
For MediaFile back in May, I wrote a article on the layoffs that have been occurring for the past few months. What I did to collect the data is Business Insider had the article formated in a way that made it easy via rvest to scrape the page and get the company, the date, and how many people were laid off.
I then aggregated the statistics, grouping by month to see how many people were laid off per month.