Word Lists and Search Tools

The Tools

The original search tool is grep, a classic Unix utility delivered as part of the standard software on any Linux or Mac system. You can also use an online version of grep hosted by the National Puzzlers' league, on a variety of lists they have available (downloadable below). Grep is based on a syntax for matching text called regular expressions. See my guide to regular expressions or many others you can find on the web. Food for finding:

The NPL also has a page with custom search tools for specific flat types which could sometimes be useful for other purposes.

Nutrimatic is a standby for many Mystery Hunt teams. Unlike most search engines based on specific lists of words and set phrases, Nutrimatic's database is text pulled from Wikipedia. This allows you to find lots of phrases not found on other sites, but also means you get lots of results which are just arbitrary sequences of words that sometimes appear together. The home page explains its syntax, which includes basic regular expression tools and a few other special characters.

Qat is a very powerful search tool. It has some regular expression syntax and some different syntax. Lower case letters match themselves, while capital letters are variables which match the same sequence of letters each time they appear. The most powerful tool here is the Equation Solver, which lets you put multiple expressions (separated by a semicolon) which must all match words (so you will get back pairs or trios of words which fit the rule). The expression can also include restrictions not based on words, such as |A|=2 to specify the length of a variable. Detailed guide on the site.

OneLook has pretty basic wildcard syntax explained on the home page. Its strengths are a very large database of real in-the-language phrases and the ability to search for words related to particular concepts. It provides links to online dictionaries defining its results.

Merriam-Webster has a page to search the current Scrabble dictionary which is much more restricted than most of these other searches, so go there if Scrabble words are what you want.

The Internet Anagram Server, also called I, Rearrangement Servant based on the anagram featured in its header, lets you find exact anagrams of a set of letters, which may consist of multiple words. On the advanced page you can choose the word list (including several languages and one with only proper names), and limit the number of words and length of words in the results.

quipqiup is a cryptogram solver rather than a word list search, but you can use it to find words matching a specific cryptogram pattern much more easily than trying to write that in the regexp-like syntax of most of the other tools here.

Matt Ginsberg's cluer is a downloadable database of millions of published crossword clues and answers and a program for Windows, Mac, and Linux for searching it.

Rhyme Zone provides a search of rhyming words, homophones, and other types of searches.

The Word Lists

A bunch of classic word lists (ones people used in the 1990s) are available on the NPL's web site. This includes, among others, the Scrabble dictionary from that era (2-8 letter words only) and ENABLE, which was the first attempt from the same era to build an official list of all tournament-legal words (which used Merriam-Webster's Collegiate to check base forms longer than 8 letters). The Moby word list was the largest list available back then. The lists of head-words from several dictionaries and a number of lists from other sources are included there. Some of the sites where these lists were originally hosted are no longer in operation.

Spread the wordlist contains some more modern word lists. Some of these are available in a "scored" format used with crossword construction programs (a higher score means a word that makes a better crossword entry). The page about Peter Broda's wordlist explains the formats of these lists in more detail.

Wikipedia makes all their data available since the entire site's available under open licenses such as Creative Commons, but from the perspective of making a word list, the most useful bit of this is probably the lists of page titles, updated daily for all their wikis. These files use underscoes in place of spaces and otherwise mtch the text of the page titles.