Not Fun at Parties

Jan 7, 2022

Yeah you're right. I had a go last night at trying to build out all the sub trees, but with my hacky, probably not very performant code, it stops being feasible to do all that calculation in the browser.

I also think this issue is particularly important if you wanted to write a solver for the "hard mode" on Wordle, because in that case you have many fewer options if your first guess gives you a lot of clues.

Expand full comment

Jan 7, 2022

I might have another go over the weekend, because I think you could easily come up with a heuristic to narrow down the guesses that it's worth calculating the sub-trees for.

For example, there's no point calculating the sub-trees for "FUZZY" because it's such a terrible first guess, but if you only calculated the sub-trees for the top 5-10% of first guesses, it would cut the amount of calculations needed by even more than 10x or 20x because the best guesses also have the fewest remaining words afterwards.

Expand full comment

Sam

Jan 7, 2022

That would probably improve it, but I bet there really are at least some really crazy sequences where there's a much more optimal guess than the heuristic finds.

It's interesting how hard a constraint satisfaction problem it is given how easy humans find it to do relatively well at it.

Would be very interesting to see a neural net try to optimize wordle.

Expand full comment

Mark

Jan 8, 2022

There is certainly an ML approach, but I got tangled up in general- vs. specific-purpose. There are only 2315 words in the manually-culled list Wordle uses, so the usual purpose of training a model -- supervising training to optimize on inputs the model has never seen -- isn't even necessary to solve Wordle, because you can see all possible inputs. Now if you wanted to solve all five-letter words, or all n-letter words, that's probably a big enough set. I built a super-naive algorithm that simply guesses the first valid word alphabetically, and even that succeeds 80% of the time with a median of 5 guesses. Then I thought about doing it on the entire set of five-letter combinations (~11 million), but an algorithm optimized to guess QXQXZ isn't going to be optimal for the actual word set. I thought, to your point, it would be more interesting to seek out all the crazy sequences to see just how far off-axis the long tail goes. That seems to crop up when you have many equally good guesses and happen to hit on the missing letter, i.e. CA?ES can be CAGES, CARES, CANES, CAPES, etc. A general solution might in fact optimize to avoid subtrees that put you in that corner, using the fact that not all five-letter inputs are valid anyway.

Expand full comment

Sam

Jan 8, 2022Edited

The specific problem is still reasonably hard. The full decision tree for only 2315 is still absolutely enormous. Look at how big the tree gets for each guess here. Now recursively build trees for every guess. The sub-trees do get smaller, but you have to branch, recursively, five times. It's pretty huge. Probably doable to fully solve with a decently written program (and not trying to run it in a web browser), but it's definitely not tiny.

And the neural net comment wasn't about training the model to optimize inputs it's never seen. It was about finding an approximation to the solution for a highly non-linear problem, which is something neural networks are relatively efficient at.

Expand full comment

Jan 8, 2022

Something I've realised that would make this slightly more feasible in normal mode.

Even if you're worried about subtrees that put you in a corner, for any given answer and potential guess sequence, e.g. 1) RAISE 2) COULD etc etc, you always want the best guess for the first stage first - i.e. you would never start with COULD, then guess RAISE. This means the hypothetical trees you build for the words after the first guess can ignore any word better than the initial guess at the first stage. This cuts the calculation required by more than half at each step.

Expand full comment

Mark

Jan 8, 2022

That's fair -- not suggesting that the known problem size is trivial or anything, just that it is theoretically possible to test the hypothesis. I'm also curious about pruning the tree, but I don't have a strong intuition about the prevalence of equivalent states. The number of possibilities drops pretty substantially with each guess even if no letters match. I feel like heuristics that always use all previous knowledge in each guess would perform better (i.e., "hard mode") but maybe not, since it depends on the distribution of actual five-letter words...?

Expand full comment

I agree, it should have a cubic runtime complexity. I think my one is optimal https://github.com/koanarec/wordle_cheat

Expand full comment

Sam

Jan 29, 2022Edited

Glancing at your code, I think you are making a similar mistake to the one I was calling out: "Then it adds all these numbers together to get a score for that specific guess. The lowest guess wins. And is the best answer."

You have to examine the entire decision tree to determine the most informative step at a given node, because your choice impacts how informative subsequent choices can be too.

The problem is NP-hard: https://mobile.twitter.com/b_subercaseaux/status/1480562862804373506

Expand full comment

Madeline

Jan 18, 2022Edited

This is bugged for 213. RAISE>COUNT>BLOOD narrows the scripts possibility array down to [PROXY, GROOM, PROOF]. Since the 2nd O in BLOOD is grey it should exclude GROOM and PROOF but instead guesses both of them first before PROXY.

In fact it guesses GROOM before PROOF even though PROOF would be the better guess between the two.

Expand full comment

Yeah you're right. It's still not handling guesses with double letters very well. I made a mistake with how Wordle interpreted these and I haven't had the time to fully rectify it.

Expand full comment

Jonathan Olson

Hey, I believe this can be improved on by considering more than just the first word. It turns out that some "best" first words tend to result in some "bad" second guesses, so some different words are likely to be optimal. See https://jonathanolson.net/experiments/optimal-wordle-solutions for more information.

Expand full comment

Yeah you're 100% correct and your solver is very cool.

Expand full comment

Derek Visch

This one is great!

Expand full comment

Derek Visch

What we really need to do is make a wordle strategy simulator, that tries these strategies against each other :D

Expand full comment

Kenneth

Jan 20, 2022

Can you add support for "hard mode"

Expand full comment

Jonathan Olson

Jan 26, 2022

Added 2 starting words for hard mode. Not fully optimal yet, but hopefully it helps!

Expand full comment

Kenneth

Jan 28, 2022

Thank you!

Expand full comment

Jonathan Olson

Jan 20, 2022

Working on it, I'll let you know when I have it ready!

Expand full comment

jay613

Jan 16, 2022Edited

Nice approach. I have a question for Tom: Any kind of game cheater relies on some slightly sociopathic ability to take a contrived set of rules, sometimes as in this case ambiguous, and adapt them in a way that provides self-satisfaction and nothing more. Unless there are prizes but in the case of Wordle there isn't even bragging rights because everyone knows you can cheat with two browser sessions. I'm pretty sure that the satisfaction here comes from the ability to understand and document the behavior of the game in a fun way (despite your reputation at parties) and not from the ability to cheat just a little but not too much. IE you are making analysis, not Wordle, fun. Still, I wonder, on the vast sliding scale from (at one extreme) playing honestly as if the word list is huge and the choice is random, all the way to (at the other extreme) just using the source code, or for that matter, using two browser sessions, to obtain a "1" score every single time ... what shady self-justification did you use to arrive at this particular level of cheating along the spectrum?

Incidentally, if you want to apply the lessons learned here but without memorizing the word list or actually using a cheat tool, an ideal first two guesses are LANES and TORIC. Then you'll usually have a very small number of possible words and you can either use your head or a anagrammer or Scrabble cheater to find them (depending on your level of self-delusion).

And finally, a new Wordle knockoff game has been launched, probably not by the author of the original game, but with well-planned digital marketing design that will probably attract all the search and game play away from the original within a couple of weeks ... and that uses a far less simplistic approach including allowing the user to choose the number of letters. I suspect that fairly soon the original game will be lost in obscurity because it has NO digital marketing thought at all, not even a decent URL.

Expand full comment

Jan 16, 2022

So firstly, I think having 2 set initial guesses is bad. Both from a "having fun" perspective and an "optimal solution" perspective. Clearly you have to use the feedback from your first guess to adjust the second guess. My rule of thumb would be: use "RAISE" as a first guess, if you get a lot of letters in yellow or green, don't try a second guess that has lots of different vowels.

Secondly, with respect to the self-justification of cheating. I started building the solver using all 13,000 potential 5 letter words. My feeling was that the suggested words were so niche and bizarre, that it would feel less like cheating just to use the 2,000 likely Wordle list (to which point, if you could tell me what "POIND", "LARES" and "VETCH" mean I would be seriously impressed.) I think the guy who built Wordle realised this too - if the word you had to guess was something no one had heard of, it would immediately ruin the fun. I think a solver that suggests the best possible "normal" word is the best possible compromise - and Josh has essentially already done the work to delineate "weird" from "normal" by creating 2 separate lists.

Finally, I wouldn't be so sure the knock offs will have any success. I think it's much more likely the whole fad will pass and no one will give a shit.

Expand full comment

Karolis

Jan 12, 2022

Hi! Great tool, but a bit too much clicking on very small boxes. Maybe preselect "not in word" for each letter?

Expand full comment

Corey

Jan 20, 2022Edited

Thank you for the enjoyable blog post! A question and an observation. Might you update the post and gist to match the code running on the server? And the solver stumbles on "FOCAL" — there's still a little problem processing black tiles and it enters a loop guessing "LOCAL." (The first time around, the filter doesn't correctly handle green repetitions following black repetitions of a letter. On following iterations, all guesses being placed in the same "group" rather than different groups results in the excess repetitions not being detected as intended.)

Expand full comment

AVL

Jan 15, 2022

Hey Tom, I love this tool! My partner and I have been using this to analyze our strategy after the fact. One thing I’d really love is the ability to view the list of remaining words (e.g after you enter a word and it says “12 words remaining” it would be incredible to be able to see all twelve of those words). Obviously for large numbers of words it’s not practical, but anything below 30 or 40 or so would be super useful!

The other thing, which I imagine would be harder to implement, would be something that allows me to see the “strength” of a guess. Something like the average number of possible words remaining given any possible color configuration. That way I could compare the strength of my guess to the strength of the optimal guess.

Thanks so much for building this!

Expand full comment