This post has been covered on the Wall Street Journal!
Last post, I described my mildly obsessive strategy for making predictions in my Oscar pool this year. I’ve been driven to such measures by the repeated and humiliating losses to my brother Jon for the past quarter-score.
To recap that post: the common wisdom among Oscar pundits is that the “precursor” awards which happen before the Academy Awards (e.g. the Golden Globes, Screen Actors / Directors / Producers Guild, BAFTA, and the Critics Choice awards) tend to correlate with who wins Oscars. From what I understand, Jon looks over these when choosing a winner. I tried the same thing last year, and was on track to win until Meryl Streep won Best Actress in an upset (jerk). That night i made a vow – never again.
So, I decided to take the same approach this year, but make it more systematic. I grabbed 20 years worth of award data from the Internet Movie Database, and built a model that takes into account the degree to which each precursor Ceremony predicts the Oscars (different ceremonies do better in different categories). My theory is that this might provide an edge for close calls. Our Oscar pool also incorporates an interesting twist, in that we have some freedom to down-weight predictions that we aren’t sure of. My model makes probabilistic estimates, and thus also gives a strategy for how to weight each prediction.
Now that all of the precursor awards have taken place, I’m able to apply the model for the 2013 awards. Here are the main results (who really cares about Best Live Action short? Sorry, guy nominated for Best Live Action Short), with some punditry for good measure:
Les Miserables (12%)
Argo has swept the dramatic awards, and Les Mis won for best Golden Globe (Comedy/Musical). The last time a movie swept the dramatic awards and lost Best Picture was when Brokeback Mountain lost to Crash.
Daniel Day Lewis (65%)
Hugh Jackman / Denzel Washington (10%)
Another straightforward call, as Daniel Day Lewis has swept the dramatic acting categories this year. Plus, people love that guy. A DDL loss would be a repeat of 2002, when Denzel Washington (Training Day) unexpectedly beat Russel Crowe (A Beautiful Mind), who also swept the dramatic awards. The SAG and Critics Choice awards best predict this category.
Jennifer Lawrence (70%)
Jessica Chastain (20%)
Sorry, Quvenzhané Wallis. You may be who the Earth is for, but you aren’t who this award is for. Choosing between Lawrence and Chastain is tricky. The former won the Golden Globe comedy award and the SAG award. Jessica Chastain won the Golden Globe drama award and the critics choice award. The SAG is the best predictor, and thus the model prefers Jennifer Lawrence. If I hadn’t used a model, I would have guessed Jessica Chastain, thinking that dramatic movies have a better shot at winning Oscars than rom-coms. C’mon, math…
Anne Hathaway (87%)
Everyone else (2-5%)
The easiest prediction of the bunch. She’s been unbeatable in other ceremonies, so there’s no reason not to pick her based on the data.
Tommy Lee Jones (47%)
Christoph Waltz (30%)
This seems to be the most controversial category. The New York Times is predicting that Robert DeNiro will win, based on his aggressive oscar campaigning and his icon status (two factors not present in my model, which thinks he has a 5% shot). Tommy Lee Jones won the SAG, which best predicts the acting categories. Christoph Waltz won both the Golden Globe and BAFTA. Historically, this is a difficult category to predict based on precursor ceremonies.
Who knows? Ben Affleck has swept the other ceremonies, but was notoriously not nominated for an Oscar this year. Thus, there’s very little award information to go off of.
My model gives a slight preference to Ang Lee/Life of Pi (45%) over Steven Spielberg/Lincoln (40%), based on which ceremonies each was nominated for. My model isn’t really precise to within 5%, so it isn’t a statistically significant edge. Personally, I’m inclined to think that Steven Spielberg will win (since, you know, he’s Steven Spielberg, and it’s been a while since he won. Poor little guy.)
Another tossup between Wreck-it Ralph (47%) and Brave (40%). Both BAFTA and the critics choice awards tend to predict the correct winner about 90% of the time but, this year, they awarded different films (BAFTA->Brave, and Critics Choice -> Wreck-It Ralph). I love Pixar, but their recent movies aren’t as good as they were 5 years ago, and I loved Wreck It Ralph. I’m rooting for that movie, and its sweet 80s Nintendo soundtrack.
A Royal Affair (15%)
Historically, this is a hard award to predict from precursor ceremonies. This year Amour won BAFTA, the Critics Choice Award, and the Golden Globe, so I think its odds are pretty good. It is also nominated for Best Picture, for which it doesn’t stand a chance. I think voters will feel bad and give it the Foreign Oscar instead.
(73%) (58%) Flight (15%) Zero Dark 30 (36%)
Update (Feb 24, 7PM): I noticed an error in my Writers Guild Award data (1 hour before the ceremony!). I had incorrectly stored the Original Screenplay winner as Flight instead of Zero Dark 30, and the Adapted category as The Silver Linings Playbook instead of Argo. This changes the predictions in the two writing categories
The Silver Linings Playbook (65%) Argo (65%) Argo, Lincoln (12% each) Lincoln, Silver Linings Playbook (11% each)
Update: See above.
Thats about it for the major-ish categories. Most of the minor categories don’t have many equivalent awards in other ceremonies, so the model predictions aren’t very compelling (sorry, lady nominated for Best Makeup and Hairstyling)
This was an interesting exercise — one of the key lessons (if you see this stuff as teaching moment material) is that there is a fairly high amount of unpredictability in predicting Oscar winners based on the other awards — typical forecast accuracies are around 60% (clearly better than random guessing from a field of 5-7 nominees, but nowhere near a lock).
Perhaps models with more information could do better (genre information seems particularly relevant). However, even the people who do this stuff for a living usually only get ~75% of the categories right. So maybe it’s just hard to predict.
Or maybe we just haven’t seen the Nate Silver of Oscar forecasting yet.
Apparently, Nate Silver is the Nate Silver of Oscar forecasting. His method and conclusions are largely the same as what’s posted here. That’s encouraging.
For the past eight-ish years, I’ve participated in an Oscar prediction pool with my brothers and some of my friends. We’ve fooled around with a number of different scoring schemes. The version we’ve settled on the past few years has been to sort each prediction by how confident we are, and assign point values to each category accordingly. This seems to keep the contest more interesting longer through the awards broadcast.
These contests have typically resulted in my shame. It turns out that I am terrible at guessing which movies are likely to win Oscars (nevermind the fact that I have NO FREAKING IDEA what the difference between sound editing and sound mixing is, but apparently they each need their own categories. Whatever.). My brother Jon has won handily the past 3 years, and that has to stop.
This year, I resolved to systematize my predictions, by building a model to forecast Oscar results. Why do this, you ask?
1) The lucrative $50 prize, and all of the opportunities that that opens up.
2) I’ve been looking for a non-astrophysics data analysis project, and this seemed fun. Also, astrophysicists throw garbage at you when you try to do inference without a physically-justified model, and I wanted to slum it up a bit with some purely-empirical forecasting.
The Academy Awards is the last ceremony in a long awards season. The most obvious data to use to try to forecast the Oscars are the results from these previous ceremonies. This is especially true since the same people vote for multiple ceremonies, so these awards can be seen as polls for the oscars. There are other potentially interesting variables to consider (Rotten Tomatoes rating, Box office performance, genre or cast information, etc), but I decided to start with the ceremony results.
The IMDB archives the nominees and winners for the major ceremonies. They don’t provide a nice API for grabbing their data, but it’s easy enough to parse from the HTML (enter: BeautifulSoup). As is so often the case, data cleaning is among the most time-consuming tasks of the project. For example, many award categories change names slightly over time (Best Picture -> Best Motion Picture of the Year). After standardizing all of this information, I ended up with a JSON database of all the nominees and winners for 7 awards ceremonies since 1990 (about 8800 nominations in total).
Before doing anything fancy, I wanted to get a grasp on what the data looked like. T0 be concrete, I’ll focus on the Best Picture category for the moment. Here’s a plot of what fraction of movies go on to win the best picture Oscar, as a function of whether they were nominated for or won the award in a different ceremony.
Interestingly (though not all that surprising in retrospect), winning the Independent Spirit Award (which focuses on indie cinema) is anti-correlated with winning the Oscar. The only movie since 1990 to win both the Independent Spirit and Oscar Best Picture awards was The Artist last year. Also interesting is the fact that the Golden Globes correlates so weakly with the Oscars — this ceremony is often touted as being a good predictor for the Academy Awards, but other ceremonies clearly do better.
How do I combine all of this information into an estimate of who is most likely to win the 2013 Best Picture award? Equally as important, how do I assess how confident this prediction is, so that I can wager more points on categories which are most certain?
One of the standard strategies for estimating success probabilities based on a 0/1 outcome (a movie either wins, or it doesn’t) is logistic regression. It’s pretty much the simplest thing to try, so it’s worth checking how well it does.
There are two wrinkles to this model:
- It doesn’t account for the fact that exactly one nominee in each category wins. I address this by re-normalizing the probabilities in the model within each category, and adjusting the likelihood calculation of the data given the model accordingly. This makes finding the optimal model slightly harder, but the dataset is small enough that computing power isn’t much of a concern.
- With only 23 years of historical data, over-fitting is a danger. To address this, I ran a regularized regression. That is, instead of fitting the model by maximizing the likelihood, I maximize a modified likelihood that penalizes Logistic Regression models with large coefficients. The size of coefficients in a Logistic Regression model directly relates to how confident predictions are, so the penalty acts to make the model more conservative, and less likely to draw too-strong a conclusion from a small training dataset. The strength of the penalty is chosen by cross-validation.
The scikit-learn library is really wonderful for this kind of work. First, they provide a lot of functionality out-of-the-box (optimization, cross validation, and implementations of dozens of models). Furthermore, the API is extremely consistent, so that you can build your own custom classifiers, using scikit-learn objects as building blocks. I definitely plan on using it more, even for more vanilla model fitting and optimization tasks (its API is way better than most of SciPy in my opinion).
There are a number of criteria to evaluate whether this model is a good fit to the data.
How accurate is it?
This model correctly classifies about 75% of the best picture winners since 1990. Furthermore, the years it fails usually correspond to notable upsets; for example, Crash unexpectedly won Best Picture in 2006, despite Brokeback Mountain being a strong favorite. Brokeback Mountain won best picture in every other ceremony in the database, and Crash is the only movie that won the Oscar without even being nominated for the Golden Globe. Other notable upsets include 1999 (when Shakespeare in Love beat Saving Private Ryan) and 1996 (Braveheart won, the model predicted Apollo 13). I see these upsets as indicative of the inherent uncertainty in trying to predict Oscar winners based on other ceremonies.
How representative is the data?
One nice property about the model is that it is generative — you can use it to simulate hypothetical outcomes for each year, based on the information provided from the other ceremonies. If the model is a fair representation of the data, then the actual outcome should look like these simulated outcomes (if it doesn’t, this suggests an over-fit or mis-specified model).
The left plot shows, for each of 1000 hypothetical results of 23 years of best picture awards, how many years were correctly predicted by the model. The black line is the actual data, and falls near the typical value of this distribution. That’s encouraging.
The right plot is slightly more discerning. It plots, as a function of the confidence at which each prediction is made, how many mistakes were made which had confidences less than this threshold. If these confidences are right, then the model should make most of it’s mistakes at lower confidences — that is, the curve should rise steeply on the left and then level off (I used a similar plot when talking about Nate Silver’s election forecast results). The black lines show 500 simulations, and the red line shows the actual data. Again, it’s encouraging that it runs through the collection of black curves.
This fairly simple model seems to do a pretty good job at characterizing the outcome of the Best Picture category for the past 2 decades. It’s possible that a better model (or additional information about each nominee) could make the predictions more precise (i.e. skew the blue histogram to the right, and make it narrower). I may look into this in the coming weeks. In any event, I’ll be able to make predictions about the 2013 Oscars once the other award ceremonies happen. Look for more blog posts.
I’m coming for you, Jon.