Fun with coin flipping
December 2, 2013
In the November 2013 issue of Communications of the ACM Peter Winkler proposed three interesting puzzles about coin flipping. The puzzles initially seem rather simple, but I suspect that my intuition about the answers is incorrect. To test my intuition, I’ve written a utility in Python to perform some Monte Carlo simulations for each problem. If I’ve written the utility correctly, these simulations should provide the correct answers. I’m looking forward to the arrival of the December issue.
Problem 1
The first problem is:
You have just joined the Coin Flippers of America, and, naturally, the amount of your annual dues will be determined by chance. First you must select a head-tail sequence of length five. A coin is then flipped (by a certified CFA official) until that sequence is achieved with five consecutive flips. Your dues is the total number of flips, in U.S. dollars; for example, if you choose HHHHH as your sequence, and it happens to take 36 flips before you get a run of five heads, your annual CFA dues will be $36. What sequence should you pick? HHHHH? HTHTH? HHHTT? Does it even matter?
My intuition for this problem is that each flip of the coin has an equal chance (one out of two) of being a head or a tail. So the probability of five flips occurring in the head-tail sequence I have chosen is
\[1/2 * 1/2 * 1/2 * 1/2 * 1/2 = 1/2^5 = 1/32\]This indicates that I should expect to pay about $32 for my dues, regardless of which sequence I choose. However, my utility shows that the story is a bit more complicated. I ran the Monte Carlo simulation using 100, 1,000, 10,000, and 100,000 iterations. The results are shown in this chart (the full data are available here):
The head-tail sequences are sorted alphabetically, so the spikes on either end (around $60) are HHHHH and TTTTT. I also find it useful to view the data sorted according to the least cost of dues, as in this table from the Monte Carlo simulation with 100,000 iterations:
Sequence | Dues ($) | Sequence | Dues ($) | |
---|---|---|---|---|
HHHHT | 31.89441 | HHTTH | 33.95109 | |
TTTHH | 31.92716 | THHHT | 33.96559 | |
HTTTT | 31.93697 | TTTHT | 33.97174 | |
HHHTT | 31.9726 | THTTT | 34.00494 | |
HTHTT | 31.97717 | HTHHH | 34.0429 | |
HHTHT | 31.98652 | THHTT | 34.11864 | |
TTTTH | 32.00329 | HTTHT | 35.93113 | |
TTHHH | 32.02012 | THTTH | 36.01681 | |
TTHTH | 32.02634 | HTHHT | 36.07967 | |
THHHH | 32.02798 | THHTH | 36.16804 | |
HHTTT | 32.04128 | TTHTT | 37.68822 | |
THTHH | 32.07216 | HHTHH | 38.20029 | |
HTTTH | 33.83344 | THTHT | 42.09793 | |
TTHHT | 33.86829 | HTHTH | 42.23329 | |
HHHTH | 33.87797 | TTTTT | 61.72025 | |
HTTHH | 33.9436 | HHHHH | 61.97057 |
So for many head-tail combinations, the yearly dues are about $32. However, it seems that getting five heads in a row or five tails in a row is significantly more difficult than other combinations. So it does, in fact, matter which sequence I select. I want to avoid HHHHH and TTTTT.
Problem 2
The second problem is:
Now you have entered your first CFA tournament, facing your first opponent. Each of you will pick a different head-tail sequence of length five, and a coin will be flipped repeatedly. Whoever’s sequence arises first is the winner. You have the option of choosing first. What sequence should you pick?
Again, my intuition for this problem indicates that my choice should not matter. However, the results of the Monte Carlo simulation for problem suggest that both HHHHH and TTTTT are poor choices. Since they require more flips to occur, they are probably less likely to occur in a sequence of flips first.
For this simulation, I compared each of the 32 possible head-tail sequences against all of the other head-tail sequences in a simulated tournament for each iteration. I ran the simulation using 10, 100, 1,000, and 10,000 iterations, and determine how often each sequence won the tournament. The results are shown in this chart (the full data are available here):
As expected, both HHHHH and TTTTT are poor choices, winning least often. Again, sorting the data (for 10,000 iterations) according to which sequence wins most often is interesting:
Sequence | Chance of winning | Sequence | Chance of winning | |
---|---|---|---|---|
HTTTT | 0.9699 | HTTHH | 0.6451 | |
HHTTT | 0.774 | HHTTH | 0.6448 | |
THTTT | 0.7429 | TTHTH | 0.6444 | |
HTHTT | 0.7068 | TTHTT | 0.6443 | |
HHHTT | 0.6997 | TTHHH | 0.6292 | |
HHTHT | 0.6791 | TTHHT | 0.6291 | |
HHHHT | 0.6676 | THHTH | 0.621 | |
THHTT | 0.6675 | THTTH | 0.6203 | |
THHHH | 0.6525 | HHTHH | 0.6112 | |
THHHT | 0.6488 | TTTHH | 0.6014 | |
HTHHT | 0.6488 | HTHTH | 0.6014 | |
HTTHT | 0.6487 | TTTHT | 0.5957 | |
THTHH | 0.6487 | THTHT | 0.5934 | |
HTHHH | 0.6475 | TTTTH | 0.4971 | |
HHHTH | 0.6472 | HHHHH | 0.4944 | |
HTTTH | 0.6467 | TTTTT | 0.4941 |
I’m surprised to see that the HTTTT sequence wins so often, almost 97% of the time! Without a clear analytic proof of this result, I have to suspect that my utility is flawed somehow so that this sequence seems to win more often. However, I cannot detect the problem with the utility.
Problem 3
The third problem is:
Following the tournament (which you win), you are offered a side bet. You pay $1 and flip a coin 100 times; if you get exactly 50 heads, you win $20 (minus your dollar). If you lose, you are out only the $1. Even so, should you take the bet?
My intuition is that although each sequence of 100 flips should contain 50 heads (since each flip has a one out of two chance of being a head), I doubt we can really count on that holding true. The problem indicates this as well, providing me with twenty-to-one odds. So if just one of out twenty sequences of 100 flips contains exactly 50 heads, I still break even. This seems like a good bet, but I decided to test it.
For each iteration of the Monte Carlo simulation for this problem, my utility will perform 100 flips and count the number of heads 1000 times. After each sequence of 100 flips, the total amount of money both paid into and out of the bet is accumulated. So if the total winnings for any iteration are more than $1000 dollars, then that iteration is considered a good bet. I ran the simulation using 1, 10, and 100 iterations. Here are the results:
Iterations | Good Bets |
---|---|
1 | 0 |
10 | 0 |
100 | 2 |
After winning only 2% of the time in the best case, I can be sure that I won’t take this bet.
Are these results correct?
These results are certainly surprising, and they don’t line up with my intuition in most cases. My inability to solve these problems analytically lead me to perform Monte Carlo simulations of them. I’m interested to see the correct analytic solutions in the December 2013 CACM issue, so that I can determine if these simulations are accurate.