Charles Shiflett
abear@cats.ucsc.edu
Prog. Asg. No. 3
29th of Febuary, 2004

Boggle

Emulating and optimizing a classic board game using digital logic

Overview

Boggle is a classic game, played with sixteen die on a four by four grid. Each die contains 6 letters, and can take any of the sixteen possible grid positions. This ensures that each game of boggle contains a pseudo random arrangement of letters. Once the letters are arranged, the player (or computer) attempts to come up with arrangements of words of four or more letters starting from any letter, and then picking adjacent letters until a english word is formed. Each letter may only be used once. The game is 'won' by finding as many words as possible (in a given amount of time).

Files

Makefile 
boggle.c       - user interface to cube/dict packages
boggle.h       - obsolete
boggleTester.c - time analysis (takes file as a argument)
cube.c         - cube implementation (createCube and cube solving functions)
cube.h         - cube adt
dict.c         - dict implementation (also listed as dictADT.c)
dict.h         - dict adt (also listed as dictADT.h)
cubes.dat      - possible dice file
board.dat      - example board layout

Program Design

The game of boggle contains three conceptually different steps, they are;
  1. Arranging letters in to a pseudo random arrangement on a 4x4 grid.
  2. Checking words against a dictionary
  3. Solving the boggle cube by checking every possible word

Implementation of points 1-3 provides the core functionality of Boggle. In addition to points 1-3, a user interface is needed which sets up variables (i.e. dictionay location), and takes care of i/o. In my example implementation, I have provided an implementation of points 1-3, as well as provided two example user-interface. One interface tests program execution to specific sets of input, and the other provides a way of playing Boggle against the user.

The remainder of this document is meant to explain how to design a user interface, what algorithms were used to implement points 1-3, and an analysis of the effectiveness of the algorithms used. The document is arranged such that each highlighted point has one section devoted to it, followed by a section devoted to implementing a specific user interface. Finally, I conclude this paper with an analysis of the total pacakge, as well as what could be improved, and what worked well in the design of this package.

Pseudo Random Letter Arrangement

Using the method given in the introduction for determining letter placement, we are left with 59025489844657012604928000 (nearly six trillion trillion) different board layout's. It also happens that their is no easy way of mapping this very large number into the event space used by a boogle cube, so we just do it the slow and painful way.

Our first step is to read the sixteen dice consisting of six possible letters each. In this step, we read from a file line by line, and in each line choose a random letter. The assumption is that the file consists of ascii lines containing letters which will tend to generate a good board layout.

As we are reading in the letters from the file, we also choose a random location on the Boggle grid to place each letter (or die as it would be on a real board). Each time we place a letter on the grid, we have one less position that is possible. So we start with sixteen possible positions to place a letter, then fifteen, then fourteen and so on. To maintain the mapping between generating a random number between zero and x, I implemented the following algorithm:

	1	for (i=15 to 0) do
	2		cube_position = random() modulo (i+i)
	3		for (j=0 to 15)
	4			if ( cube[cube_position] == initialized )
	5				cube_position++;
	6		while (cube[cube_position] == initialized ) 
	7			cube_position++;
	8		cube[cube_position] = letter from second paragraph
	9	end for loop

Line 4 maps the random numbers (which tend to be less than sixteen), to the sixteen possible positions on the board. Line 2-5 insure our mapping is correct, but we have not checked against possible collisions. On line 6, the algorithm iterates (if it needs to) through positions until we find a unitialized position in the cube to put our die.

Once we have created our 4x4 grid, we export the grid as an array of sixteen characters. Each call to CreateCube (filename) results in this process being repeated and in turn results in a new placement of the nearly six trillion trillion possible placements.

Dictionary Look Up

On average a dictionary file consists of about 2.5 Megabytes, and about 250,000 words. If we are to provide fast look up times and efficent memory usage, we have to take care in how we approach the creation of our dictionary implementation.

I deceided against making a completely generic implementation in favor of making an efficent dictionary implementation. In so doing, I made the following limitation - a word may be no longer than 16 bytes. Without that limitation, we have to assume a word can be infinite in length, which then requires having an array of pointers to string objects. This is horribly memory inefficent (and difficult to search) when each pointer is 4 bytes, and each string object is at least the length of the string plus 4 bytes.

By limiting our word length to 16 bytes, we can then say that each word will be 16 bytes long, and if the word contains less than 16 letters, we can then pad out the word with zeroes to be 16 bytes. Now, rather than needing an array of pointers, we have just one very large array, which contains all of the words in our dictionary. In the example above, where we had 250,000 words, we would use exactly 4 MB of ram, which is not bad at all.

Searching through the array then becomes a relatively trivial affair. In my implementation, I implemented three ways of searching through the dictionary, which are outlines below:

Both linear search and binary search use trivial algorithms. Binary search relies on the c library implementation of binary search, while linear search is included with the dictionary files. Both functions rely on wordCompare() which does character comparisions. wordCompare() could be optimized to do word size compare's, which would result in a slight speed increase.

Radix search is implemented by creating a index for each letter consisting of the 26 possible letters. The search is then accomplished by going to the index for the first letter, then going through the index for the second letter and so on. This results in a very fast but memory intensive search. For instance, to construct a four level tree requires 136 MB ((27*4)^4).

What makes the search work is that in general the radix tree is sparse. The algorithm is also optimized such that once a word is determined to be unique, we just point to the original array, rather than having to create a n level radix structure for every word. The pseudo code to create a index and traverse a index are shown below:

Radix Index Creation
    for (i='a' to 'z') {
    	*word_pos=i
		 switch( wordCount ( word_full, dict) ) 
             case 0: 
                  radix->letter[i-0x61] = 0;
             case 1:
                  radix->letter[i-0x61] = bsearch 
                 ( word_full, dict->word, dict->last, 16, &wordCompare) + 1;
             case 2:
                  radix->letter[i-0x61]=radixDict(dict, word_full, word_pos);
                  word_pos[1]=0
    }
Radix Index Search
    while (1) {
        if (*word) {
            if (radix->letter[*word - 0x61]) {
                    if (radix->letter[*word - 0x61] & 1)
                        return 0x02; /*word is a only match */
                    else
                        radix = radix->letter[*word - 0x61];
            } else
                    return 0; /* no match, TERMINATING CASE */
        } else { /* not *word */
            if (radix->isWord)
                return 0x01 | 0x02; /* word is a prefix and dictionary word */
            else
                return 0x01; /* Word is a prefix */

        } 

        word++;
    }

Each of the three sorting algorithms is automatically picked based on the state of the internal 'sorted' flag. In practice, radix based searching should not be used because of the large memory overhead, and index creation time instead sortDict() should be called.

Aside from the implementation issues, the interface exported by the dictionary ADT is pretty easy. createDict() returns a dictionary object, insertFile()/insertWord() create content, and the dictionary is accessed via the isWord() function, which compares a string against the words in the dictionary file. Memory for the storage of strings is allocated dynamically with a worst case situation of 2n-1 wasted memory. This makes the dictionary package very versatile and usable for manipulating any collection of words.

Solving the Boggle Cube

The solution to the Boggle cube (or grid as the case may be), is solved using recursion, and looking up every possible word against the dictionary. This is not nearly as bad as it sounds, since most bogus word combinations will be found out in the first two to three characters.

From whatever position we are at, we try to go in every legal position, and recursively call upon ourself (with each new word found). Immediatly upon being called a check is made against the dictionary object, and if the check is succesful we keep going. The basic idea is that a letter is either on the cube, or it is in the recursion stack. As we progress deeper into recursion, more letters will be on the stack, and as we progress out of recursion, letters will be put back from the stack and back onto the board. The pseudo code is shown below:

solveCube() - non recursive, calls recursive part for every letter
    matches=createDict();
    for (i=0; i < 16; i++)
        doSolveCube(cube, dict, matches, i, word );
    return matches;
doSolveCube() - exhaustively searches the board
	result = isWord ( dict, word );

    if ( strlen(word) >= 4 )
        if (result & 0x02) 							//exact match 
            if ( !(isWord(results, word) & 0x02) )  //not duplicate
                insertWord ( results, word );

    if ( !(result & 0x01) )  { //word is not a prefix
        word[len]=0;          
        return ;    
    } 

    my_letter = cube[start]; // copy letter from cube to recursive stack
    cube[start]=0;           // set the cube position to null

	recursively call self for every legal position

    cube[start] = my_letter; // copy my_letter back to the cube
    word[len]   = 0;         // and set our string back to what it was

Desiging a User Interface

With the cube, dictionary, and solving routines complete, a user interface is cake. Most of what we are doing is initializing the ADT, as shown below:

        dict = createDict () ;
        insertFile ( dict, "dictionary.dat" );
        sortDict (dict);

        cube = createCube("cubes.dat");
        printCube();

        comp=solveCube (cube, dict);
        sortDict(comp);

        user=userTurn (cube, dict, comp);
        printDict(comp); 

A user turn is then just reading in input, and checking that the input is legal. All word lists are handled as dictionaries, so passing around and comparing entries is very easy. In the above example, we computed all possible solutions (and stored them in the comp dictionary), and then passed them to userTurn(), which can then quickly check user entries against a list of solutions.

Conclusion

This was a fun program to write, as it did require the use of algorithms. I also used ideas from class in the analysis of my paper, and in the design of my program... Which I think is very important, since the programming is supposed to be related to the class.

As usual, I tried to keep my program as simple as possible, and the result is that the entire program fits in about 500 lines of code, not bad considering this readme is over 300 lines! I am unable to break the program, and if given good input, the program should run perfect!

In terms of optimizations, solving the entire cube takes 0.6 mS on unix.ic... Not a bad time at all! This is an improvement from about 5mS using binary search. This is as should be expected, as the critical path in this program is in searching through the dictionary, and not in the recursive step... Though at 0.6mS, the recursive step likely becomes non-negligable, however I suspect the bottleneck is in memory access and not in processor speed at this point. As I stated, the radix search tree can get very large, and computers tend to be much faster at sequintial memory access than in random memory access (owing to the design of the operating system as much as ram latency).

Further optimizations could be in both the recursive step and the dictionary lookup. In general, we extensively search through the dictionary for words with a similar prefix, thus if we knew where we were before, and we knew where we were likely to be we could limit the scope of our search, and thus speed the process up. We could also compact memory (using a variety of methods), which would result in faster memory access (and higher cache/page hits).