The 10,000 Octopus Problem

Noah Haber
Meet Paul the Octopus. Paul is famous. When Paul lived in an aquarium in Germany in 2008, his handlers decided to play a game. They would give him two boxes, one with each team the German European Championship team was playing next, with some food in it. Paul (mostly) predicted the outcomes of the German European Championship matches in 2008 by eating from the box of winning team first, just happening to choose Germany each time. But some doubted Paul. They said he was lucky. So Paul stepped up his game, buckled down, studied up, and waited for his chance in the 2010 World Cup. He correctly predicted every single match the German team played, and then went on to predict the finals between the Netherlands and Spain. Don’t believe me? It’s all on wikipedia.

Unfortunately, Paul has sadly shuffled off this mortal tentacoil, so we can’t do a “real” test of his skills. But we CAN review a few theories on just how Paul, like his namesake, was so prescient.

Theory #1: Paul is really good at predicting football matches

This probably isn’t the blog for you.

Theory #2: Paul (or his handler) loves Germany

In the 2008 matches, Paul chose the box representing his home, Germany, each time, and was mostly right choosing correctly 4 times and wrong 2. In 2010, Paul changed things up. He chose Germany only 5/7 times, and was correct in each instance. So Paul chose Germany 11 times out of 13 (let’s ignore the 2010 finals match, which Germany didn’t play in, for a moment). Maybe Paul’s handlers (who were German, remember) put tastier food in the Germany box. Or maybe Paul prefers the black, red, and gold stripes of the German flag. Who knows? More importantly, who cares when we have a far simpler explanation.

Theory #3: Paul got lucky

Paul got it right 12/14 times (counting the 2010 finals). Let’s assume for a moment that Paul’s prediction is basically a coin flip, and that he just got lucky. How lucky does Paul have to be for this to work? We can predict the probability of getting 12/14 coin flips right using a simple binomial distribution. Assuming that these 12 trials are all independent (we’ll get to that), the probability that Paul would have gotten exactly 12/14 matches right is roughly 0.6%. That’s not great, but that’s not bad either. An easier way to think about that probability is by its inverse. If you wanted one octopus to get 12/14 boxes right by random chance, you would need 180 octopodes. So it’s plausible that Paul got lucky.

Theory #4: Paul got lucky, and we’re bad at understanding uncertainty

#3 works if we had done this before Paul predicted matches. The problem is, we (mostly) didn’t. We know about Paul because he got a little lucky, retroactively. But we don’t know about all the other octopi whose handlers did the same thing, but failed. Most importantly, you don’t know about them because they failed. If you have enough octopuses that know nothing about football, one of them is going to just happen to get it right by chance. The coin that happened to flip the right combination isn’t special just because it happened to flip the right combination. Enough octopuses with typewriters will eventually write an exact copy of 20,000 Leagues under the Sea.

Now, of course, things aren’t really quite that simple, and we’re glossing over some important details. Paul didn’t really have two independent trials, he had two sets of trials (or three, depending if you count the 2010 final separately). He started becoming famous after the 2008 trials. But he didn’t get REALLY famous until 2010.

We can give Paul the benefit of the doubt and say he was learning the rules of the game in 2008 and look only at 2010, in which case the chances of predicting correctly all 8 matches in 2010 is 1/256, slightly more remarkable than our 1/180 above. On the other hand, maybe he truly did have a German flag preference, or the handler helped a little by offering better food in higher chance boxes, which would make Paul be more likely to be correct by non-prescient influence.

To get all 13 of the Germany matches right, you would need 8,192 octopuses (2^13), or 16,384 (2^14) to have one get all 14 matches right if we include the 2010 match. If you had a few other advantages (like color / food preference) that number is lower.  Let’s call it 10,000.

Of course, there weren’t 10,000 octopodes picking matches. After all, there are only a few hundred aquariums in the world. But the comparison isn’t just other octopuses. There are thousands of other “low probability” events happening all the time, from other animals and other sports to anything else. We only know about things that happen, and not those that don’t, and tend to think those things that happen are more remarkable than the are.

Even if you repeat the experiment a bunch of times, streaks are random too. If you’ve ever heard of the Sports Illustrated cover jinx, that’s an extension of this problem. You get on the cover because you (randomly) had an anomalously good streak. You have much better chances of getting that streak if you are a better player, but it’s unlikely that you’ll repeat it a second time. You tend to need a LOT of experiments to tease out what is (noise) luck, and what is skill (signal), and even then there is a chance you randomly get misleading results.

Most of the time, this problem is relatively harmless, like our probably not-so-prescient fried Paul here. Sometimes that is deeply harmful.

It matters

You’ve seen this before: Person has a deadly metastatic cancer, and is told they have 6 months to live. They take some supplement, and boom, cancer magically cured. Or someone tells you to punch sharks in the nose to avoid being eaten during an attack. Let’s just assume for a moment that all of that is literally, actually true. The problem is simple: you never hear about all the people who took supplements and punched sharks but died anyway. Some people just get lucky.

Most importantly, this happens EVERYWHERE. It’s the main reason why most of those studies that find near miraculous sounding cures for diseases don’t pan out, and why anecdotes make bad evidence, and why you shouldn’t pick your stocks on who made the most money last year. Statisticians aren’t people who make certainty with decisions; we’re people who spend a lot of time understanding and dealing with UNcertainty.

Updates: Corrected 2008 being the European Championship, not the World Cup. Credit to Matthew Rogers for finding this error. Corrected English because I am bad at copy-editing, credit to Dan Larremore.

Thoughts and comments welcome