Saturday, September 20, 2008

Simulating the 2008 Presidential Election Using Intrade Data


Hello!

This is a blog I set up so that other people could see the results of my procrastinatory activities, and have their time likewise wasted.

The basic concept is this: I download the current probabilities that each state in the U.S. will go for Obama or McCain from Intrade. (For those not familiar with Intrade, Slate has an introduction.) Some scripts I wrote run many many thousands of virtual elections based on those probabilities, counting how many electoral votes each candidate gets in each scenario. Then I publish the results here, telling you how likely it is that Obama will get 269+ electoral votes vs. 270+ electoral votes, etc.

The results tend to differ significantly from the overall "McCain next president" or "Democrat next president" contracts. Feel free to speculate on why that is.

They also differ from what Intrade has recently started posting, which is the electoral outcome of the most likely scenario only. That is, they're only publishing the mode of the distribution over electoral votes, which to me is less interesting than some other statistics. I give you the whole CDF.

The more statistically savvy among you have probably noticed two major problems with this approach:
1. I'm assuming that the outcome in each state is independent. Under this assumption, McCain could (with very small probability) lose Pennsylvania and win New York. In reality, McCain winning New York would more likely be the result of some hyper-catastrophic event for the Democratic party that would result in a universal landslide for McCain. If anyone has some good ideas about how to model the correlations between states, I'd be interested to hear them, but I'm also likely to not do anything with them. There's other stuff I should probably be doing.
2. I'm assuming that the Intrade state-by-state data accurately integrate all available information about the likely outcomes of each state's contest. They might. They might not. The purpose of this blog is not to argue the accuracy of predictive markets. It's to make it easier to waste time obsessing over politics.

7 comments:

tmeyer said...

Sure I'll waste time.

The national Intrade numbers for McCain and Obama closely mirror the numbers for Ohio combined with the "Kerry states + IA, CO, NM" strategy.
Add in a little NH and VA and you've got something that makes sense.

I'll waste more time tomorrow!

Thanks to DKos for sending me here.

stephen said...

the problem for modeling state-correlations is similar to the problem Wall Street had pricing credit-default swaps... any unexpected failure is indicative of a systemic failure and cannot be rated independently.

Euglossine Bee said...
This comment has been removed by the author.
thouis said...

I don't understand the need for simulations. Can't you compute the CDF exactly, since you're treating states as independent?

Start with all probabilities for different EVs (0-358) at zero. Call these probabilities EV. Take a state with N electoral votes, and probability P for whichever candidate you're calculating for.

Update EV as:

EV[i] = EV[i-N] * P + EV[i] * (P - 1)

(for all i simultaneously). Loop through states until done.

Am I missing something?

Matt Hoffman said...

@ thouis:
Good observation, you're quite right (except for a typo: (P - 1) should be (1 - P) I think). I was basically looking for the simplest thing I could do/explain, which was to run simulations as described above.

The results seem to match up to within about 0.02%, but seeing as I've already implemented it and it turns out to be faster to run, there's no reason not to use the dynamic programmingish approach you described. Future posts will be of the exact CDF.

thouis said...

Great. Current post has all zeros for McCain, though.

Also, any chance you could post either your intrade data fetching scripts, or the raw numbers for each day?

Matt Hoffman said...

Oops. Fixed the zeros.

I'm a little leery of reposting Intrade's data, which is a little more legally dubious than posting analyses of it.

But I can put the script I use to parse their html and run the calculations online at http://www.cs.princeton.edu/~mdhoffma/drop/intradeparser.pl . And now I have.