Nate Silver made a lot of testable predictions about the election on his 538 blog. In particular, he predicted the winner of each state (and DC), and placed a confidence percentage on each prediction. He did the same for senate races. In total, that’s 84 predictions with confidence estimates.
As in 2008, his predictions were phenomenal. Some of the races are not yet decided, but it looks like all of his presidential predictions were correct (Florida is not yet called as I write this), as well as all but perhaps 2 of his senate predictions (Democrat candidates are unexpectedly leading the Montana and North Dakota races). There are plenty of pundits who were predicting very different results.
Granted, while 82-84 correct predictions sounds (and is) amazing, many of those were no brainers. Romney was always going to win Texas, just as Obama was a sure bet in New York. A slightly harder test is whether his uncertainty model is consistent with the election outcome.
Let’s simulate 1000 elections. For each race, we assume (for the moment) Silver’s uncertainties are correct. That is, if he called a race at a confidence of x%, then we assume the prediction should be wrong 100-x% of the time.
This plot shows, for each simulated election (in red), the total number of mis-predicted races with prediction confidences greater than the threshold on the x axis. The left edge of the plot gives the total number of mis-predictions (at any confidence). The half-way point shows the number of errors for predictions with confidences greater than 75%.
The lines all go down with increasing confidence — that is, there are fewer expected errors for high-confidence predictions (like Texas or New York). I’ve added some random jitter to each line, so they don’t overlap so heavily. The grey bands trace the central 40% and 80% of the simulations. The thick black line is the average outcome.
This plot summarizes the number of mis-classifications you would expect from Nate Silver’s 538 blog, given his uncertainty estimates. A result that falls substantially above the gray bands would indicate too many mistakes, and too-optimistic a confidence model. Lines below the bands indicate not enough mistakes, or too pessimistic a model.
If we assume that the North Dakota and Montana senate races end up as upsets, here is Nate Silver’s performance:
He did, in fact, do slightly better than expected (I doubt he’ll lose sleep over that). This result is broadly consistent with what we should expect if Silver’s model is correct. On the other hand, consider what happens if he ends up correctly predicting these two senate races. It’s unlikely that Nate Silver should have predicted every race correctly, given his uncertainty estimates (this happens in about 2% of simulated elections). It’s possible that Silver will actually tighten up his uncertainty estimates next election.
In any event, I think he knows what he’s talking about. I’m reminded of this clip (Nate Silver is Jeff Goldblum. The rest of the world is Will Smith)