Tuesday, November 1, 2016

Math

Lets say you send out letters to random people. In it you pick the winner of the NHL's Stanley Cup. You do this every 4 years.

In 1980, you send 100,000 letters saying that New York will win, and 100,000 saying Philly will win.

NY wins. So in 1984 you take those 100,000 people you sent the "right" letter to, and tell 50,000 of them that NY will win again, and 50,000 of them that Edmonton will win.

When the latter happens, you send letters in 1988 to 25,000 people saying Edmonton will win again, and to 25,000 people saying Boston will win. Edmonton wins.

1992 sees 12,500 letters saying Pittsburg will win. 1996 sees 6,250 saying Colorado will win. 2000 has you sending 3,125 letters saying New Jersey will win. So on and so forth.

At the end of this you will be sending around 390 letters to people who, every time you've sent them a letter, you've been right. Despite this, you've just been randomly guessing.


Lets also look at this backwards.

Lets say there are a series of models that predict a Presidential election winner. Only one of the 'long term' models, ones that started in 1980, have been right in every election between 1980 and 2012.

2 of them, however, were right prior to 2012 (IE in 2008)
4 in 2004, 8 in 2000, 16 in 1996
in fact back in 1980 you only need to start with 256 models, half of which lose their first guess.


And ALL of this is based on random chance. Just randomly picking a winner.



I'm not saying that this or that model is bad or good at predicting a winner, but I am saying that just because something has happened X times in a row, it does not mean it's a quality model. Quality models are quality models, and sometimes, quality models are wrong.

No comments:

Post a Comment