I could be wrong and not thinking this all the way through, but I don't think this is optimal. Your trees handle bigrams and co-occurrence, but isn't it the case that a particular guess could seem to be be suboptimal at one step, but turn out to have been optimal at a later step?
Imagine (for some word list) that guessing "s" first narrows the range as much as possible in step one (say, it narrows the range of possible words 50%). The three branches from that guess (green, yellow, or black) lead to three new optimal guesses for your second guess. Let's say that green "s" means the next optimal guess narrows the field by a further 10%, yellow "s" means the next guess narrows the field by 5%, and black "s" means the next guess narrows the field by 3%. Taken together, the two guesses narrow the field by 60% ("s" was green), 55% ("s" was yellow), and 53% ("s" was black), which is an average of 56%.
Isn't it the case that, for that same word list, guessing "p" first might narrow the range by less (say only 49%), yet the three branches for the next guess might be better than they were for "s"? So "p" might seem like a worse guess at the moment, but it might be that the optimal guesses off each branch of that guess are better. Maybe the subsequent optimal guess after green "p" narrows by 9%, after yellow "p" by 8%, and after black "p" by 7%. So the sequence of guesses taken together give you 58%, 57%, and 56%, for an average of 57% (higher than 56%).
I don't think you can actually optimize this with a single tree at each step. I think you would need to actually build the full tree of trees (not just the trees you have here, but the tree of these trees for each subsequent guess, all the way to solutions). Your approach, looking at only one of those subtrees at a time, assumes a constraint of that larger decision tree that I don't think it has.
Yeah you're right. I had a go last night at trying to build out all the sub trees, but with my hacky, probably not very performant code, it stops being feasible to do all that calculation in the browser.
I also think this issue is particularly important if you wanted to write a solver for the "hard mode" on Wordle, because in that case you have many fewer options if your first guess gives you a lot of clues.
I might have another go over the weekend, because I think you could easily come up with a heuristic to narrow down the guesses that it's worth calculating the sub-trees for.
For example, there's no point calculating the sub-trees for "FUZZY" because it's such a terrible first guess, but if you only calculated the sub-trees for the top 5-10% of first guesses, it would cut the amount of calculations needed by even more than 10x or 20x because the best guesses also have the fewest remaining words afterwards.
That would probably improve it, but I bet there really are at least some really crazy sequences where there's a much more optimal guess than the heuristic finds.
It's interesting how hard a constraint satisfaction problem it is given how easy humans find it to do relatively well at it.
Would be very interesting to see a neural net try to optimize wordle.
There is certainly an ML approach, but I got tangled up in general- vs. specific-purpose. There are only 2315 words in the manually-culled list Wordle uses, so the usual purpose of training a model -- supervising training to optimize on inputs the model has never seen -- isn't even necessary to solve Wordle, because you can see all possible inputs. Now if you wanted to solve all five-letter words, or all n-letter words, that's probably a big enough set. I built a super-naive algorithm that simply guesses the first valid word alphabetically, and even that succeeds 80% of the time with a median of 5 guesses. Then I thought about doing it on the entire set of five-letter combinations (~11 million), but an algorithm optimized to guess QXQXZ isn't going to be optimal for the actual word set. I thought, to your point, it would be more interesting to seek out all the crazy sequences to see just how far off-axis the long tail goes. That seems to crop up when you have many equally good guesses and happen to hit on the missing letter, i.e. CA?ES can be CAGES, CARES, CANES, CAPES, etc. A general solution might in fact optimize to avoid subtrees that put you in that corner, using the fact that not all five-letter inputs are valid anyway.
The specific problem is still reasonably hard. The full decision tree for only 2315 is still absolutely enormous. Look at how big the tree gets for each guess here. Now recursively build trees for every guess. The sub-trees do get smaller, but you have to branch, recursively, five times. It's pretty huge. Probably doable to fully solve with a decently written program (and not trying to run it in a web browser), but it's definitely not tiny.
And the neural net comment wasn't about training the model to optimize inputs it's never seen. It was about finding an approximation to the solution for a highly non-linear problem, which is something neural networks are relatively efficient at.
Something I've realised that would make this slightly more feasible in normal mode.
Even if you're worried about subtrees that put you in a corner, for any given answer and potential guess sequence, e.g. 1) RAISE 2) COULD etc etc, you always want the best guess for the first stage first - i.e. you would never start with COULD, then guess RAISE. This means the hypothetical trees you build for the words after the first guess can ignore any word better than the initial guess at the first stage. This cuts the calculation required by more than half at each step.
That's fair -- not suggesting that the known problem size is trivial or anything, just that it is theoretically possible to test the hypothesis. I'm also curious about pruning the tree, but I don't have a strong intuition about the prevalence of equivalent states. The number of possibilities drops pretty substantially with each guess even if no letters match. I feel like heuristics that always use all previous knowledge in each guess would perform better (i.e., "hard mode") but maybe not, since it depends on the distribution of actual five-letter words...?
Glancing at your code, I think you are making a similar mistake to the one I was calling out: "Then it adds all these numbers together to get a score for that specific guess. The lowest guess wins. And is the best answer."
You have to examine the entire decision tree to determine the most informative step at a given node, because your choice impacts how informative subsequent choices can be too.
This is bugged for 213. RAISE>COUNT>BLOOD narrows the scripts possibility array down to [PROXY, GROOM, PROOF]. Since the 2nd O in BLOOD is grey it should exclude GROOM and PROOF but instead guesses both of them first before PROXY.
In fact it guesses GROOM before PROOF even though PROOF would be the better guess between the two.
Yeah you're right. It's still not handling guesses with double letters very well. I made a mistake with how Wordle interpreted these and I haven't had the time to fully rectify it.
Hey, I believe this can be improved on by considering more than just the first word. It turns out that some "best" first words tend to result in some "bad" second guesses, so some different words are likely to be optimal. See https://jonathanolson.net/experiments/optimal-wordle-solutions for more information.
Nice approach. I have a question for Tom: Any kind of game cheater relies on some slightly sociopathic ability to take a contrived set of rules, sometimes as in this case ambiguous, and adapt them in a way that provides self-satisfaction and nothing more. Unless there are prizes but in the case of Wordle there isn't even bragging rights because everyone knows you can cheat with two browser sessions. I'm pretty sure that the satisfaction here comes from the ability to understand and document the behavior of the game in a fun way (despite your reputation at parties) and not from the ability to cheat just a little but not too much. IE you are making analysis, not Wordle, fun. Still, I wonder, on the vast sliding scale from (at one extreme) playing honestly as if the word list is huge and the choice is random, all the way to (at the other extreme) just using the source code, or for that matter, using two browser sessions, to obtain a "1" score every single time ... what shady self-justification did you use to arrive at this particular level of cheating along the spectrum?
Incidentally, if you want to apply the lessons learned here but without memorizing the word list or actually using a cheat tool, an ideal first two guesses are LANES and TORIC. Then you'll usually have a very small number of possible words and you can either use your head or a anagrammer or Scrabble cheater to find them (depending on your level of self-delusion).
And finally, a new Wordle knockoff game has been launched, probably not by the author of the original game, but with well-planned digital marketing design that will probably attract all the search and game play away from the original within a couple of weeks ... and that uses a far less simplistic approach including allowing the user to choose the number of letters. I suspect that fairly soon the original game will be lost in obscurity because it has NO digital marketing thought at all, not even a decent URL.
So firstly, I think having 2 set initial guesses is bad. Both from a "having fun" perspective and an "optimal solution" perspective. Clearly you have to use the feedback from your first guess to adjust the second guess. My rule of thumb would be: use "RAISE" as a first guess, if you get a lot of letters in yellow or green, don't try a second guess that has lots of different vowels.
Secondly, with respect to the self-justification of cheating. I started building the solver using all 13,000 potential 5 letter words. My feeling was that the suggested words were so niche and bizarre, that it would feel less like cheating just to use the 2,000 likely Wordle list (to which point, if you could tell me what "POIND", "LARES" and "VETCH" mean I would be seriously impressed.) I think the guy who built Wordle realised this too - if the word you had to guess was something no one had heard of, it would immediately ruin the fun. I think a solver that suggests the best possible "normal" word is the best possible compromise - and Josh has essentially already done the work to delineate "weird" from "normal" by creating 2 separate lists.
Finally, I wouldn't be so sure the knock offs will have any success. I think it's much more likely the whole fad will pass and no one will give a shit.
Thank you for the enjoyable blog post! A question and an observation. Might you update the post and gist to match the code running on the server? And the solver stumbles on "FOCAL" — there's still a little problem processing black tiles and it enters a loop guessing "LOCAL." (The first time around, the filter doesn't correctly handle green repetitions following black repetitions of a letter. On following iterations, all guesses being placed in the same "group" rather than different groups results in the excess repetitions not being detected as intended.)
Hey Tom, I love this tool! My partner and I have been using this to analyze our strategy after the fact. One thing I’d really love is the ability to view the list of remaining words (e.g after you enter a word and it says “12 words remaining” it would be incredible to be able to see all twelve of those words). Obviously for large numbers of words it’s not practical, but anything below 30 or 40 or so would be super useful!
The other thing, which I imagine would be harder to implement, would be something that allows me to see the “strength” of a guess. Something like the average number of possible words remaining given any possible color configuration. That way I could compare the strength of my guess to the strength of the optimal guess.
This is fantastic. Your approach to solving wordle seems much better than others I've seen, and the fact you have managed to code it is really impressive! One thing though... - test it with DROLL, a word that came up the other day. I tried it, and it started looping at DROOL and did not complete in six tries...
It probably needs a tweak to stop it trying to use a word it has already tried! ;-)
This is brilliant! Your solution approach is fantastic, and the fact you have managed to code it is even more impressive. One thing though - test it with DROLL, a word that came up the other day. I tried it, and it gets stuck in a loop on DROOL and doesn't complete in 6 tries! :-(
Today's Wordle word, April 17: 'AMPLE' gives me a "Cigar, I think you screwed up" after 'APPLE' in the following successful sequence on NYTimes Site: 'TRACE', 'BONUS', 'APPLE', 'AMPLE'.
This is a bit buggy today (4-6-2022) - I guessed RAISE --> CLOUT --> COCOA but after putting in the results of cocoa, it keeps suggesting the same disproven word as the best guess again.
R A I S E (Yellow A)
C L O U T (Green C, yellow O)
C O C O A (Green C, Green O, gray C, gray O, green A)
Minor bug today? - Fun to see how the algorithm now and then after I solve, to see it’s approach. But noticed that after it’s fourth guess today: CATCH, the green boxes are missing. End of the road. (RAISE, LYNCH, HUMPH, CATCH)
I recently analyzed English 5 letter words. I wanted to know what letters are most common in five letter words. After seeing the letter frequencies it was easy to figure out what would be the best guesses for first words in Wordle. Here you can see the Wordle five letter statistics https://www.unscramblerer.com/wordle-solver/
However choosing an optimal second word becomes much more harder. Also probably ruins the fun at some point.
I could be wrong and not thinking this all the way through, but I don't think this is optimal. Your trees handle bigrams and co-occurrence, but isn't it the case that a particular guess could seem to be be suboptimal at one step, but turn out to have been optimal at a later step?
Imagine (for some word list) that guessing "s" first narrows the range as much as possible in step one (say, it narrows the range of possible words 50%). The three branches from that guess (green, yellow, or black) lead to three new optimal guesses for your second guess. Let's say that green "s" means the next optimal guess narrows the field by a further 10%, yellow "s" means the next guess narrows the field by 5%, and black "s" means the next guess narrows the field by 3%. Taken together, the two guesses narrow the field by 60% ("s" was green), 55% ("s" was yellow), and 53% ("s" was black), which is an average of 56%.
Isn't it the case that, for that same word list, guessing "p" first might narrow the range by less (say only 49%), yet the three branches for the next guess might be better than they were for "s"? So "p" might seem like a worse guess at the moment, but it might be that the optimal guesses off each branch of that guess are better. Maybe the subsequent optimal guess after green "p" narrows by 9%, after yellow "p" by 8%, and after black "p" by 7%. So the sequence of guesses taken together give you 58%, 57%, and 56%, for an average of 57% (higher than 56%).
I don't think you can actually optimize this with a single tree at each step. I think you would need to actually build the full tree of trees (not just the trees you have here, but the tree of these trees for each subsequent guess, all the way to solutions). Your approach, looking at only one of those subtrees at a time, assumes a constraint of that larger decision tree that I don't think it has.
Yeah you're right. I had a go last night at trying to build out all the sub trees, but with my hacky, probably not very performant code, it stops being feasible to do all that calculation in the browser.
I also think this issue is particularly important if you wanted to write a solver for the "hard mode" on Wordle, because in that case you have many fewer options if your first guess gives you a lot of clues.
I might have another go over the weekend, because I think you could easily come up with a heuristic to narrow down the guesses that it's worth calculating the sub-trees for.
For example, there's no point calculating the sub-trees for "FUZZY" because it's such a terrible first guess, but if you only calculated the sub-trees for the top 5-10% of first guesses, it would cut the amount of calculations needed by even more than 10x or 20x because the best guesses also have the fewest remaining words afterwards.
That would probably improve it, but I bet there really are at least some really crazy sequences where there's a much more optimal guess than the heuristic finds.
It's interesting how hard a constraint satisfaction problem it is given how easy humans find it to do relatively well at it.
Would be very interesting to see a neural net try to optimize wordle.
There is certainly an ML approach, but I got tangled up in general- vs. specific-purpose. There are only 2315 words in the manually-culled list Wordle uses, so the usual purpose of training a model -- supervising training to optimize on inputs the model has never seen -- isn't even necessary to solve Wordle, because you can see all possible inputs. Now if you wanted to solve all five-letter words, or all n-letter words, that's probably a big enough set. I built a super-naive algorithm that simply guesses the first valid word alphabetically, and even that succeeds 80% of the time with a median of 5 guesses. Then I thought about doing it on the entire set of five-letter combinations (~11 million), but an algorithm optimized to guess QXQXZ isn't going to be optimal for the actual word set. I thought, to your point, it would be more interesting to seek out all the crazy sequences to see just how far off-axis the long tail goes. That seems to crop up when you have many equally good guesses and happen to hit on the missing letter, i.e. CA?ES can be CAGES, CARES, CANES, CAPES, etc. A general solution might in fact optimize to avoid subtrees that put you in that corner, using the fact that not all five-letter inputs are valid anyway.
The specific problem is still reasonably hard. The full decision tree for only 2315 is still absolutely enormous. Look at how big the tree gets for each guess here. Now recursively build trees for every guess. The sub-trees do get smaller, but you have to branch, recursively, five times. It's pretty huge. Probably doable to fully solve with a decently written program (and not trying to run it in a web browser), but it's definitely not tiny.
And the neural net comment wasn't about training the model to optimize inputs it's never seen. It was about finding an approximation to the solution for a highly non-linear problem, which is something neural networks are relatively efficient at.
Something I've realised that would make this slightly more feasible in normal mode.
Even if you're worried about subtrees that put you in a corner, for any given answer and potential guess sequence, e.g. 1) RAISE 2) COULD etc etc, you always want the best guess for the first stage first - i.e. you would never start with COULD, then guess RAISE. This means the hypothetical trees you build for the words after the first guess can ignore any word better than the initial guess at the first stage. This cuts the calculation required by more than half at each step.
That's fair -- not suggesting that the known problem size is trivial or anything, just that it is theoretically possible to test the hypothesis. I'm also curious about pruning the tree, but I don't have a strong intuition about the prevalence of equivalent states. The number of possibilities drops pretty substantially with each guess even if no letters match. I feel like heuristics that always use all previous knowledge in each guess would perform better (i.e., "hard mode") but maybe not, since it depends on the distribution of actual five-letter words...?
I agree, it should have a cubic runtime complexity. I think my one is optimal https://github.com/koanarec/wordle_cheat
Glancing at your code, I think you are making a similar mistake to the one I was calling out: "Then it adds all these numbers together to get a score for that specific guess. The lowest guess wins. And is the best answer."
You have to examine the entire decision tree to determine the most informative step at a given node, because your choice impacts how informative subsequent choices can be too.
The problem is NP-hard: https://mobile.twitter.com/b_subercaseaux/status/1480562862804373506
This is bugged for 213. RAISE>COUNT>BLOOD narrows the scripts possibility array down to [PROXY, GROOM, PROOF]. Since the 2nd O in BLOOD is grey it should exclude GROOM and PROOF but instead guesses both of them first before PROXY.
In fact it guesses GROOM before PROOF even though PROOF would be the better guess between the two.
Yeah you're right. It's still not handling guesses with double letters very well. I made a mistake with how Wordle interpreted these and I haven't had the time to fully rectify it.
Hey, I believe this can be improved on by considering more than just the first word. It turns out that some "best" first words tend to result in some "bad" second guesses, so some different words are likely to be optimal. See https://jonathanolson.net/experiments/optimal-wordle-solutions for more information.
Yeah you're 100% correct and your solver is very cool.
This one is great!
What we really need to do is make a wordle strategy simulator, that tries these strategies against each other :D
Can you add support for "hard mode"
Added 2 starting words for hard mode. Not fully optimal yet, but hopefully it helps!
Thank you!
Working on it, I'll let you know when I have it ready!
Nice approach. I have a question for Tom: Any kind of game cheater relies on some slightly sociopathic ability to take a contrived set of rules, sometimes as in this case ambiguous, and adapt them in a way that provides self-satisfaction and nothing more. Unless there are prizes but in the case of Wordle there isn't even bragging rights because everyone knows you can cheat with two browser sessions. I'm pretty sure that the satisfaction here comes from the ability to understand and document the behavior of the game in a fun way (despite your reputation at parties) and not from the ability to cheat just a little but not too much. IE you are making analysis, not Wordle, fun. Still, I wonder, on the vast sliding scale from (at one extreme) playing honestly as if the word list is huge and the choice is random, all the way to (at the other extreme) just using the source code, or for that matter, using two browser sessions, to obtain a "1" score every single time ... what shady self-justification did you use to arrive at this particular level of cheating along the spectrum?
Incidentally, if you want to apply the lessons learned here but without memorizing the word list or actually using a cheat tool, an ideal first two guesses are LANES and TORIC. Then you'll usually have a very small number of possible words and you can either use your head or a anagrammer or Scrabble cheater to find them (depending on your level of self-delusion).
And finally, a new Wordle knockoff game has been launched, probably not by the author of the original game, but with well-planned digital marketing design that will probably attract all the search and game play away from the original within a couple of weeks ... and that uses a far less simplistic approach including allowing the user to choose the number of letters. I suspect that fairly soon the original game will be lost in obscurity because it has NO digital marketing thought at all, not even a decent URL.
So firstly, I think having 2 set initial guesses is bad. Both from a "having fun" perspective and an "optimal solution" perspective. Clearly you have to use the feedback from your first guess to adjust the second guess. My rule of thumb would be: use "RAISE" as a first guess, if you get a lot of letters in yellow or green, don't try a second guess that has lots of different vowels.
Secondly, with respect to the self-justification of cheating. I started building the solver using all 13,000 potential 5 letter words. My feeling was that the suggested words were so niche and bizarre, that it would feel less like cheating just to use the 2,000 likely Wordle list (to which point, if you could tell me what "POIND", "LARES" and "VETCH" mean I would be seriously impressed.) I think the guy who built Wordle realised this too - if the word you had to guess was something no one had heard of, it would immediately ruin the fun. I think a solver that suggests the best possible "normal" word is the best possible compromise - and Josh has essentially already done the work to delineate "weird" from "normal" by creating 2 separate lists.
Finally, I wouldn't be so sure the knock offs will have any success. I think it's much more likely the whole fad will pass and no one will give a shit.
Hi! Great tool, but a bit too much clicking on very small boxes. Maybe preselect "not in word" for each letter?
Thank you for the enjoyable blog post! A question and an observation. Might you update the post and gist to match the code running on the server? And the solver stumbles on "FOCAL" — there's still a little problem processing black tiles and it enters a loop guessing "LOCAL." (The first time around, the filter doesn't correctly handle green repetitions following black repetitions of a letter. On following iterations, all guesses being placed in the same "group" rather than different groups results in the excess repetitions not being detected as intended.)
Hey Tom, I love this tool! My partner and I have been using this to analyze our strategy after the fact. One thing I’d really love is the ability to view the list of remaining words (e.g after you enter a word and it says “12 words remaining” it would be incredible to be able to see all twelve of those words). Obviously for large numbers of words it’s not practical, but anything below 30 or 40 or so would be super useful!
The other thing, which I imagine would be harder to implement, would be something that allows me to see the “strength” of a guess. Something like the average number of possible words remaining given any possible color configuration. That way I could compare the strength of my guess to the strength of the optimal guess.
Thanks so much for building this!
I agree with AVL - it would be great to see the "remaining" words, let's say when they're under 10.
Your solver got stuck on SKATE. Gets as far as STATE and just repeats. Ideas?
Can't solve today for STUNG.
This is fantastic. Your approach to solving wordle seems much better than others I've seen, and the fact you have managed to code it is really impressive! One thing though... - test it with DROLL, a word that came up the other day. I tried it, and it started looping at DROOL and did not complete in six tries...
It probably needs a tweak to stop it trying to use a word it has already tried! ;-)
This is brilliant! Your solution approach is fantastic, and the fact you have managed to code it is even more impressive. One thing though - test it with DROLL, a word that came up the other day. I tried it, and it gets stuck in a loop on DROOL and doesn't complete in 6 tries! :-(
Today's Wordle word, April 17: 'AMPLE' gives me a "Cigar, I think you screwed up" after 'APPLE' in the following successful sequence on NYTimes Site: 'TRACE', 'BONUS', 'APPLE', 'AMPLE'.
This is a bit buggy today (4-6-2022) - I guessed RAISE --> CLOUT --> COCOA but after putting in the results of cocoa, it keeps suggesting the same disproven word as the best guess again.
R A I S E (Yellow A)
C L O U T (Green C, yellow O)
C O C O A (Green C, Green O, gray C, gray O, green A)
repeats
Minor bug today? - Fun to see how the algorithm now and then after I solve, to see it’s approach. But noticed that after it’s fourth guess today: CATCH, the green boxes are missing. End of the road. (RAISE, LYNCH, HUMPH, CATCH)
It didn’t work for the word ELDER. It told me I screwed up and called me a cigar.
I recently analyzed English 5 letter words. I wanted to know what letters are most common in five letter words. After seeing the letter frequencies it was easy to figure out what would be the best guesses for first words in Wordle. Here you can see the Wordle five letter statistics https://www.unscramblerer.com/wordle-solver/
However choosing an optimal second word becomes much more harder. Also probably ruins the fun at some point.