# Playing Hangman

Remember hangman? Here are the rules: Player A chooses a secret word (say “hollow”) and draws underscores for each letter in that word (“_ _ _ _ _ _”). Player B then guesses a letter. If that letter is in the word (“L”), player A reveals the locations of those letters (“_ _ L L _ _”). If that letter is not in the word (R), then player B gets a strike. If player B gets a certain number of strikes (perhaps 6 or 8 ) before he guesses the word, he loses. Otherwise he wins.

If you are the guesser, what is your strategy for guessing? Or, more nerdily, how would you program a computer to guess at hangman?

## Minimizing Risk

John McLoone wrote an interesting post on the Wolfram Blog about this topic. His basic strategy is, at every turn, to calculate all possible candidate words (based on the length of the word and the positions of the letters that have been guessed), and choose the letter which appears in the largest number of these words.

If you assume that player A was equally likely to choose any given word (that is, he/she picked the word out of the dictionary at random), then McLoone’s strategy is a greedy algorithm that minimizes the risk of mis-guessing at each step (the “greedy” part comes from the fact that the algorithm only considers the current turn, and doesn’t plan ahead).

## Minimizing Uncertainty

There are other things you might want to optimize, however. For a contrived example, imagine that the puzzle is “_ _ _ _”, and you know that the word is either “boat”, “mart”, or “bunt”. The algorithm above would notice that “T” appears in 100% of the possible words, and hence choose that letter. But what good does that do us? After all, before the computer even guesses, it knows that the board will look like “_ _ _ T”! It didn’t take any risk, but it also didn’t learn anything. On the other hand, if it guesses “B”, it will narrow down the list of possibilities to 2 (“B _ _ _”, for boat or bunt), or 1 (“_ _ _ _”, consistent only with mart).  Even better, if it guesses “A”, it is guaranteed to know the answer! The possible boards are “_ _A _” (boat), “_A_ _” (mart) or “_ _ _ ” (bunt). Guessing B or A runs a higher risk of mis-guessing, but both guesses bring us closer to the answer.

This suggests a second greedy algorithm for playing hangman: at each turn, determine all the possible outcomes from each guess. Then, choose the letter that minimizes the expected number of candidate words after the guess. Lets work that out for the example above:

• Guessing T: There is only one possible outcome “_ _ _ T”, with 3 candidate words. The expected number of candidates after the guess is 3.
• Guessing B: 2 possible outcomes. The first outcome (“B _ _ _”) has 2 candidate words (boat and bunt). The second outcome (_ _ _ _) has one candidate (mart). The probability of reaching the first outcome is 2/3, and the second outcome has a probability of 1/3. The expected number of candidates is thus 2/3 * 2 + 1/3 * 1 = 5/3
• Guessing A: 3 possible outcomes, each with one candidate. The expected number of candidates is 1.

## Implementing Strategies

Both of the above strategies seem reasonable, but how do they compare head to head? We can use these algorithms to play many hangman games, and compare their performance. How many games should we play? Lots. All of them. Every single word. Why not?

Here’s a simple program to play hangman in Python:

```def play_hangman(word, dictionary, guess_func):

#dictionary to hold guesses. keys will be the letters guessed
#values will be the turn at which each key was guessed
guessed = {}

#pre filter the dictionary -- only keep words whose
#length matches that of the answer
dictionary = filter(lambda x: len(x) == len(word), dictionary)

turn = 0
while len(dictionary) > 1:

turn += 1

# guess new letter
letter = guess_func(guessed, dictionary)
guessed[letter] = len(guessed)

# update mask and filter dictionary
dictionary = [ dictionary[i] for i in range(len(dictionary))

#There is only one remaining candidate, so we've figured it out.
#Guess any remaining letters
for l in word:
if l not in guessed:
guessed[l] = len(guessed)

# convert the guesses to an ordered list
g = guessed.keys()
g.sort(key=lambda x:guessed[x])

# return the guess list and number of turns needed
# to figure out the answer
return (g, turn)
```

The inputs to the program  are the answer to the puzzle, the dictionary of possible words, and a reference to a function that makes the guesses. Inside the program, the code repeatedly asks the guess function for a new guess and eliminates from the dictionary all the words that have been ruled out (it does this by converting each word into a mask based on the previous guesses — e.g. the word “boat” becomes “…t” after t is guessed — and only keeps the words whose mask equals the mask of the correct answer). Once the answer is known, the program returns the sequence of guesses as well as the number of guesses it took until the list of candidate words was narrowed down to one.

The code to convert guesses and words to masks is simple, and uses the string method translate:

```
#make the translation string
to = ''.join([l if l in guesses else '.' for l in string.lower])
trans = string.maketrans(string.lower, to)
return [w.translate(trans) for w in words]

```

Finally, we need a function to guess letters. The inputs to this function are the letters that have been guessed (as a python dictionary, where the keys are guessed letters and the values are the turn at which each guess occurred), and the list of potential words. McLoone’s risk-minimizing strategy can be implemented like this:

```def guess_maxfreq(guessed, dictionary):
result =  * 26
for i,l in enumerate(string.lowercase):
if l in guessed:
continue
result[i] = sum([l in d for d in dictionary])
return string.lowercase[argmax(result)]
```

Argmax is a NumPy function that returns the location of the maximum element in a sequence. The code to minimize the uncertainty is:

```def guess_min_uncertainty(guessed, dictionary):
result = [len(dictionary) + 1] * 26
for i,l in enumerate(string.lowercase):
if l in guessed: continue
guessed[l] = 1
guessed.pop(l)
return string.lowercase[argmin(result)]

def expected_uncertainty(collection):
d = defaultdict(int)
for item in collection:
d[item] += 1
result = 0.
total = 1. * len(collection)
for value in d.values():
result += value**2 / total
return result

```

This code calculates the new masks (outcomes) associated with each guess, and then calls the function expected_uncertainty to compute the expected number of masks that would match one selected at random.

## Comparing Performance

I ran each of these algorithmas against the 233,615 unique words in my Mac’s dictionary (located in /usr/share/dict/words). For each word, I kept tabulated the number of incorrect guesses, as well as the number of guesses needed to figure out what the answer is (note that this is not necessarily the number of guesses needed to guess every letter. For example, only one word matches the pattern ‘FISHIN_’, even though the G hasn’t been guessed yet). Here are the distributions of those two quantities for each algorithm: The red curve is the risk-minimization algorithm, while the black curve is the uncertainty minimization algorithm. We can see that, since each algorithm optimizes a different function, they perform differently according to these two metrics. The uncertainty-minimizing approach is willing to risk wrong guesses to more quickly find out what the correct word is. Thus, on average, it solves the puzzle more in fewer guesses, but makes more incorrect guesses on the way there.

Which algorithm flat-out looses more often? This depends on how many wrong answers are allowed before the game ends, but typical values might be 6 or 9. The risk-minimizing algorithm lost 11,398 out of 233,615 games when playing up to 6 wrong guesses, and 2,661 when playing up to 9. The uncertainty-minimizing strategy lost 12,528 and 2,817 games. In the second algorithm’s rush to find the answer as quickly as possible, it racks up too many incorrect guesses slightly more often.

## The Hardest Words

McLoone’s post had an interesting list of the best hangman words with which to play against a computer. We can repeat this experiment here, which produces the following “Top 10” list of words

Risk Minimizing

1. yell (17 incorrect guesses)
2. yill (17)
3. waff (16)
4. wapp (16)
5. well (16)
6. will (16)
7. zain (16)
8. haff (15)
9. high (15)
10. jazz (15)

Uncertainty Minimizing

1. yell (17 wrong guesses)
2. yill (17)
3. well (16)
4. will (16)
5. wugg (16)
6. yuck (16)
7. hagi (15)
8. hoju (15)
9. jazzy (15)
10. judy (15)

These results are similar to, but different from, McLoone’s list. This is probably because we are using different dictionaries.

These  lists always surprise me, because so many of them are easily-recognized words (jazz, high, yell). The trick is that there are lots of 4 letter words and, after an early guess of ‘a’ and a puzzle of ‘.a..’, there are lots of words to eliminate before converging on ‘jazz’. Since neither j or z occur in very many of these words, neither algorithm wants to guess them (they are unlikely to reduce the uncertainty, and very likely to be incorrect guesses). Many of these words also repeat a letter, so there are fewer correct guesses.

It’s somewhat counter-intuitive, but the long, obscure words that a human might have a very hard time guessing are a breeze for computers with perfect knowledge of the dictionary: “zygosporangium” can be guessed without any mistakes by either algorithm.