Wordle 提示器

心血来潮写的Wordle提示器,思路大概如下

  • 把谜题中已经暴露的信息转换为正则表达式
    • 绿色=固定字母,黄色=含有字母&位置错误,黑色=不含字母
  • 从词表(*)里筛选出符合规范的词,即所有的“可能答案”
  • 计算所有可能答案的mutual information,找到之中mutual information最大的(几个)词(**)
    • 这里使用了简单的计数法:绿色=2分,黄色=1分,黑色=0分

注:

词表:我用了github上的这个词表derekchuank/high-frequency-vocabulary。经嘟友@[email protected] 提醒,其实wordle有dump出的词表,词库大约2.3k,允许输入大约10.6k,来源自这个reddit讨论串 a_note_on_wordles_word_list/。可以自行替换。

答案表:reddit的讨论指出每期问题的答案是人工挑选的而非随机抽取,这意味着信息分布与默认的平均分布不符。Well,既然答案表有2.3k词,就算是人工挑选也足够多词了。

有趣的发现:

  • 从词表上来看,最好的“起始词”并不是adieu而是rates, aries, cares, lanes这几个,因为u这个原因出现次数其实不如辅音r和s高
  • 就算用词表作弊,也挺需要运气的,比如Wordle 243,最后需要从4个合法词里随机尝试,我的这个策略的步数期望值是4.

代码大概这样

import numpy as np
import re

# # create 5-letter word list
# with open("30k.txt") as file:
#     for line in file:
#     	w = line.rstrip()
#     	if len(w)==5:
#     		with open('5letterwords.txt', 'a') as fnew:
#     			fnew.write(w+'\n')


def info_score(target,current,matchscore=2,containscore=1):
	"""
    compute information score
    target: target word
    current: current try
    change the scores to optimize the searching process
    """
	s = 0
	for i in range(5):
		if current[i]==target[i]:
			s+=matchscore
		elif current[i] in target:
			s+=containscore
	return s



def main():
	# run hinter

	# get word list
	with open("5letterwords.txt") as f: 
		words=[line.strip() for line in f]

	# initialize masks
	letters_contain = []
	pattern = ["[^0]"]*5
	done = [False]*5

	while not all(done):

		# new inpt from user
		w = input("Please enter your first try word: ")
		m = input("Please enter your result (black=0,yellow=1,green=2): ")

		# loop through new input
		for i in range(5):
			if m[i]=='2':
				# correct, remove regax mask and mark as done
				pattern[i]=w[i]
				done[i]=True
			elif m[i]=='1':
				# semi-correct, add to contain list and mask this index
				letters_contain+=w[i]
				pattern[i]=pattern[i][:-1]+w[i]+']'
			elif m[i]=='0':
				# wrong guess, add to all regax mask
				for j in range(5):
					if not done[j]:
						pattern[j]=pattern[j][:-1]+w[i]+']'
		# exit if done
		if all(done): 
			print("You solved it. Bye.")
			exit()
		# apply regax and contain list
		words_new = [wd for wd in words if 
				re.match('^'+("").join(pattern)+'$',wd) 
				is not None and all([l in wd for l in letters_contain])]
		# exit if funny problem happened
		if not words_new:
			print("out of words, maybe something wrong?")
			exit()
		# compute similarity matrix
		sim_matrix =[[info_score(x,y) for y in words_new] for x in words_new]
		# zip and sort and print results
		mutural_info_sums = list(zip(words_new, [sum(x) for x in sim_matrix]))
		best_guess = [x[0] for x in sorted(mutural_info_sums, key=lambda x: x[1],
			reverse=True)[:5]]
		print("Best next guesses are "+(", ").join(best_guess)+".")


if __name__ == "__main__":
    main()

效果大概是这样

答案是robin

答案是slime:

答案是abbey:

答案是aloft: 很奇怪这个词不在30k list里

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注