Fun with coin flipping

December 2, 2013

In the November 2013 issue of Communications of the ACM Peter Winkler proposed three interesting puzzles about coin flipping. The puzzles initially seem rather simple, but I suspect that my intuition about the answers is incorrect. To test my intuition, I’ve written a utility in Python to perform some Monte Carlo simulations for each problem. If I’ve written the utility correctly, these simulations should provide the correct answers. I’m looking forward to the arrival of the December issue.

Problem 1

The first problem is:

You have just joined the Coin Flippers of America, and, naturally, the amount of your annual dues will be determined by chance. First you must select a head-tail sequence of length five. A coin is then flipped (by a certified CFA official) until that sequence is achieved with five consecutive flips. Your dues is the total number of flips, in U.S. dollars; for example, if you choose HHHHH as your sequence, and it happens to take 36 flips before you get a run of five heads, your annual CFA dues will be $36. What sequence should you pick? HHHHH? HTHTH? HHHTT? Does it even matter?

My intuition for this problem is that each flip of the coin has an equal chance (one out of two) of being a head or a tail. So the probability of five flips occurring in the head-tail sequence I have chosen is

\[1/2 * 1/2 * 1/2 * 1/2 * 1/2 = 1/2^5 = 1/32\]

This indicates that I should expect to pay about $32 for my dues, regardless of which sequence I choose. However, my utility shows that the story is a bit more complicated. I ran the Monte Carlo simulation using 100, 1,000, 10,000, and 100,000 iterations. The results are shown in this chart (the full data are available here):

The head-tail sequences are sorted alphabetically, so the spikes on either end (around $60) are HHHHH and TTTTT. I also find it useful to view the data sorted according to the least cost of dues, as in this table from the Monte Carlo simulation with 100,000 iterations:

Sequence	Dues ($)	Sequence	Dues ($)
HHHHT	31.89441	HHTTH	33.95109
TTTHH	31.92716	THHHT	33.96559
HTTTT	31.93697	TTTHT	33.97174
HHHTT	31.9726	THTTT	34.00494
HTHTT	31.97717	HTHHH	34.0429
HHTHT	31.98652	THHTT	34.11864
TTTTH	32.00329	HTTHT	35.93113
TTHHH	32.02012	THTTH	36.01681
TTHTH	32.02634	HTHHT	36.07967
THHHH	32.02798	THHTH	36.16804
HHTTT	32.04128	TTHTT	37.68822
THTHH	32.07216	HHTHH	38.20029
HTTTH	33.83344	THTHT	42.09793
TTHHT	33.86829	HTHTH	42.23329
HHHTH	33.87797	TTTTT	61.72025
HTTHH	33.9436	HHHHH	61.97057

So for many head-tail combinations, the yearly dues are about $32. However, it seems that getting five heads in a row or five tails in a row is significantly more difficult than other combinations. So it does, in fact, matter which sequence I select. I want to avoid HHHHH and TTTTT.

Problem 2

The second problem is:

Now you have entered your first CFA tournament, facing your first opponent. Each of you will pick a different head-tail sequence of length five, and a coin will be flipped repeatedly. Whoever’s sequence arises first is the winner. You have the option of choosing first. What sequence should you pick?

Again, my intuition for this problem indicates that my choice should not matter. However, the results of the Monte Carlo simulation for problem suggest that both HHHHH and TTTTT are poor choices. Since they require more flips to occur, they are probably less likely to occur in a sequence of flips first.

For this simulation, I compared each of the 32 possible head-tail sequences against all of the other head-tail sequences in a simulated tournament for each iteration. I ran the simulation using 10, 100, 1,000, and 10,000 iterations, and determine how often each sequence won the tournament. The results are shown in this chart (the full data are available here):

As expected, both HHHHH and TTTTT are poor choices, winning least often. Again, sorting the data (for 10,000 iterations) according to which sequence wins most often is interesting:

Sequence	Chance of winning	Sequence	Chance of winning
HTTTT	0.9699	HTTHH	0.6451
HHTTT	0.774	HHTTH	0.6448
THTTT	0.7429	TTHTH	0.6444
HTHTT	0.7068	TTHTT	0.6443
HHHTT	0.6997	TTHHH	0.6292
HHTHT	0.6791	TTHHT	0.6291
HHHHT	0.6676	THHTH	0.621
THHTT	0.6675	THTTH	0.6203
THHHH	0.6525	HHTHH	0.6112
THHHT	0.6488	TTTHH	0.6014
HTHHT	0.6488	HTHTH	0.6014
HTTHT	0.6487	TTTHT	0.5957
THTHH	0.6487	THTHT	0.5934
HTHHH	0.6475	TTTTH	0.4971
HHHTH	0.6472	HHHHH	0.4944
HTTTH	0.6467	TTTTT	0.4941

I’m surprised to see that the HTTTT sequence wins so often, almost 97% of the time! Without a clear analytic proof of this result, I have to suspect that my utility is flawed somehow so that this sequence seems to win more often. However, I cannot detect the problem with the utility.

Problem 3

The third problem is:

Following the tournament (which you win), you are offered a side bet. You pay $1 and flip a coin 100 times; if you get exactly 50 heads, you win $20 (minus your dollar). If you lose, you are out only the $1. Even so, should you take the bet?

My intuition is that although each sequence of 100 flips should contain 50 heads (since each flip has a one out of two chance of being a head), I doubt we can really count on that holding true. The problem indicates this as well, providing me with twenty-to-one odds. So if just one of out twenty sequences of 100 flips contains exactly 50 heads, I still break even. This seems like a good bet, but I decided to test it.

For each iteration of the Monte Carlo simulation for this problem, my utility will perform 100 flips and count the number of heads 1000 times. After each sequence of 100 flips, the total amount of money both paid into and out of the bet is accumulated. So if the total winnings for any iteration are more than $1000 dollars, then that iteration is considered a good bet. I ran the simulation using 1, 10, and 100 iterations. Here are the results:

Iterations	Good Bets
1	0
10	0
100	2

After winning only 2% of the time in the best case, I can be sure that I won’t take this bet.

Are these results correct?

These results are certainly surprising, and they don’t line up with my intuition in most cases. My inability to solve these problems analytically lead me to perform Monte Carlo simulations of them. I’m interested to see the correct analytic solutions in the December 2013 CACM issue, so that I can determine if these simulations are accurate.