Fun with coin flipping

December 2, 2013

In the November 2013 issue of Communications of the ACM Peter Winkler proposed three interesting puzzles about coin flipping. The puzzles initially seem rather simple, but I suspect that my intuition about the answers is incorrect. To test my intuition, I’ve written a utility in Python to perform some Monte Carlo simulations for each problem. If I’ve written the utility correctly, these simulations should provide the correct answers. I’m looking forward to the arrival of the December issue.

Problem 1

The first problem is:

You have just joined the Coin Flippers of America, and, naturally, the amount of your annual dues will be determined by chance. First you must select a head-tail sequence of length five. A coin is then flipped (by a certified CFA official) until that sequence is achieved with five consecutive flips. Your dues is the total number of flips, in U.S. dollars; for example, if you choose HHHHH as your sequence, and it happens to take 36 flips before you get a run of five heads, your annual CFA dues will be $36. What sequence should you pick? HHHHH? HTHTH? HHHTT? Does it even matter?

My intuition for this problem is that each flip of the coin has an equal chance (one out of two) of being a head or a tail. So the probability of five flips occurring in the head-tail sequence I have chosen is

\[1/2 * 1/2 * 1/2 * 1/2 * 1/2 = 1/2^5 = 1/32\]

This indicates that I should expect to pay about $32 for my dues, regardless of which sequence I choose. However, my utility shows that the story is a bit more complicated. I ran the Monte Carlo simulation using 100, 1,000, 10,000, and 100,000 iterations. The results are shown in this chart (the full data are available here):

The head-tail sequences are sorted alphabetically, so the spikes on either end (around $60) are HHHHH and TTTTT. I also find it useful to view the data sorted according to the least cost of dues, as in this table from the Monte Carlo simulation with 100,000 iterations:

Sequence Dues ($) Sequence Dues ($)
HHHHT 31.89441 HHTTH 33.95109
TTTHH 31.92716 THHHT 33.96559
HTTTT 31.93697 TTTHT 33.97174
HHHTT 31.9726 THTTT 34.00494
HTHTT 31.97717 HTHHH 34.0429
HHTHT 31.98652 THHTT 34.11864
TTTTH 32.00329 HTTHT 35.93113
TTHHH 32.02012 THTTH 36.01681
TTHTH 32.02634 HTHHT 36.07967
THHHH 32.02798 THHTH 36.16804
HHTTT 32.04128 TTHTT 37.68822
THTHH 32.07216 HHTHH 38.20029
HTTTH 33.83344 THTHT 42.09793
TTHHT 33.86829 HTHTH 42.23329
HHHTH 33.87797 TTTTT 61.72025
HTTHH 33.9436 HHHHH 61.97057

So for many head-tail combinations, the yearly dues are about $32. However, it seems that getting five heads in a row or five tails in a row is significantly more difficult than other combinations. So it does, in fact, matter which sequence I select. I want to avoid HHHHH and TTTTT.

Problem 2

The second problem is:

Now you have entered your first CFA tournament, facing your first opponent. Each of you will pick a different head-tail sequence of length five, and a coin will be flipped repeatedly. Whoever’s sequence arises first is the winner. You have the option of choosing first. What sequence should you pick?

Again, my intuition for this problem indicates that my choice should not matter. However, the results of the Monte Carlo simulation for problem suggest that both HHHHH and TTTTT are poor choices. Since they require more flips to occur, they are probably less likely to occur in a sequence of flips first.

For this simulation, I compared each of the 32 possible head-tail sequences against all of the other head-tail sequences in a simulated tournament for each iteration. I ran the simulation using 10, 100, 1,000, and 10,000 iterations, and determine how often each sequence won the tournament. The results are shown in this chart (the full data are available here):

As expected, both HHHHH and TTTTT are poor choices, winning least often. Again, sorting the data (for 10,000 iterations) according to which sequence wins most often is interesting:

Sequence Chance of winning Sequence Chance of winning
HTTTT 0.9699 HTTHH 0.6451
HHTTT 0.774 HHTTH 0.6448
THTTT 0.7429 TTHTH 0.6444
HTHTT 0.7068 TTHTT 0.6443
HHHTT 0.6997 TTHHH 0.6292
HHTHT 0.6791 TTHHT 0.6291
HHHHT 0.6676 THHTH 0.621
THHTT 0.6675 THTTH 0.6203
THHHH 0.6525 HHTHH 0.6112
THHHT 0.6488 TTTHH 0.6014
HTHHT 0.6488 HTHTH 0.6014
HTTHT 0.6487 TTTHT 0.5957
THTHH 0.6487 THTHT 0.5934
HTHHH 0.6475 TTTTH 0.4971
HHHTH 0.6472 HHHHH 0.4944
HTTTH 0.6467 TTTTT 0.4941

I’m surprised to see that the HTTTT sequence wins so often, almost 97% of the time! Without a clear analytic proof of this result, I have to suspect that my utility is flawed somehow so that this sequence seems to win more often. However, I cannot detect the problem with the utility.

Problem 3

The third problem is:

Following the tournament (which you win), you are offered a side bet. You pay $1 and flip a coin 100 times; if you get exactly 50 heads, you win $20 (minus your dollar). If you lose, you are out only the $1. Even so, should you take the bet?

My intuition is that although each sequence of 100 flips should contain 50 heads (since each flip has a one out of two chance of being a head), I doubt we can really count on that holding true. The problem indicates this as well, providing me with twenty-to-one odds. So if just one of out twenty sequences of 100 flips contains exactly 50 heads, I still break even. This seems like a good bet, but I decided to test it.

For each iteration of the Monte Carlo simulation for this problem, my utility will perform 100 flips and count the number of heads 1000 times. After each sequence of 100 flips, the total amount of money both paid into and out of the bet is accumulated. So if the total winnings for any iteration are more than $1000 dollars, then that iteration is considered a good bet. I ran the simulation using 1, 10, and 100 iterations. Here are the results:

Iterations Good Bets
1 0
10 0
100 2

After winning only 2% of the time in the best case, I can be sure that I won’t take this bet.

Are these results correct?

These results are certainly surprising, and they don’t line up with my intuition in most cases. My inability to solve these problems analytically lead me to perform Monte Carlo simulations of them. I’m interested to see the correct analytic solutions in the December 2013 CACM issue, so that I can determine if these simulations are accurate.


Content © Josh Peterson

Site design by Sirupsen