This post is by Clare Carr from MediaShift
Click here to view on the original site: Original Post
A version of this article was originally published on the Parse.ly blog. What do people do before they go see a movie? The movie industry tries to answer this question through proxies employed by marketers: surveys, data on past successes, search data, and more recently social media listening or interaction tools. Given Parse.ly’s dataset of billions of internet visitors per month to the largest media properties in the world, we thought we’d try to visualize actual reader attention, as measured by page views, for movies. We removed the need for online audiences to take an action in order to measure their behavior and instead focused on information they’re taking in.
What happens when you remove the need for proxies and focus on actual attention?To start, we examined the amount of attention a movie receives in the media and the correlation to box office success. In the scatterplot
each dot represents a single film. Dots located further to the right received more internet attention in the three days prior to their release, and those located towards the top received more total U.S. box office revenue. The high correlation between page views and box office revenue is likely simple: people tend to read articles about a movie before buying a ticket. So, the more readers a movie’s articles receive, the more money made. It’s interesting to note the exception, PG movies (which are represented by the hollow blue dots in the scatterplot were less correlated), also makes perfect sense: kids are less likely to read an article about a movie before attending it. What else can we we see by analyzing reader interest and attention in movies? One thing that became clear when digging through the data: geography affects audience attention for movies and entertainment online.
Captain America vs. Deepwater Horizon in the USAMovies, at least as measured by their box office earnings, still require an understanding of localized viewing habits. For movies released in 2016 through August 2017, our team analyzed how many total views each movie’s articles received in each of the U.S.’s media market areas. For each movie, we found every article from our database where the movie was mentioned in the text or headline. Then, using IP address, we matched each visit to these articles with the extracted the geographic location of the visitor. Here’s two examples of the results. (You can explore all movies in the visualization on our website.) We can see that the regions around the Gulf of Mexico, which were most heavily affected by the real-life Deepwater Horizon disaster, paid the most attention to the film, which makes sense and is a good sanity check for our data. The film also received high levels of attention in Western North Dakota and Eastern Montana. The two deep-green media markets in this area are at the epicenter of the shale oil boom, and are presumably home-away-from-home to many oil workers. Compare that map to the people reading about Captain America: Civil War in the US. Captain America: Civil War grossed the third-highest total box office revenue in 2016, which presumably means it had very broad appeal. With the map of audience attention for this movie, we see a broad spread across the United States, though with slightly higher concentration in the Midwest.
What movies were you most likely to read about?Using data science we can quickly identify patterns that we wouldn’t be able to see just by looking at the individual movie patterns. We used a technique called Latent Dirichlet allocation (LDA) to find these hidden geographic trends in how online readers pay attention to movies in the United States. We uncovered five distinct groups. Each group has three distinct components:
- Patterns based on geography, or visually, where the most views for the group took place. The maps below show the geographic clusters. We compared these to census data for area density, race and other factors to help describe each group.
- Most read about films in the cluster: this simply shows what percentage of page views each audience gives to each movie. Because this ranking is based on absolute volume of page views, large-budget, popular films show up in this list.
- Each audience’s most characteristic films compares that audience’s most popular list above to the average across all audiences. Because this ranking is relative (based on comparing interest to the average), both small and large budget movies both have a chance of appearing here, so this list highlights what makes each cluster unique.