CSci 2101 Data Structures: Problem set 3

Due Wednesday, March 23rd at 8pm.

Problem 1: Random sentences (5 points)

You are given the following file:

import java.util.*;

public class RandomSentences {
    public static void main(String[] args) {
	String [] articles = {"a","the","my","your"};
	String [] adjectives = {"playful","red"};
	String [] nouns = {"dog","frog","bug"};
        String [] actions = {"is walking","is jumping","is flying by"};
        
        // write code to construct random sentences out of
        // the words above
        
    }
}

Write code to construct random sentences of the form "article adjective noun action", for instance "a playful dog is walking". Write a loop to generate 20 sentences. Check that each of the words is used at least once.

You will need to use a random number generator. Make sure to write your code in such a way that if you add or delete some of the words in the arrays, your program should still work correctly.

Problem 2: Minimum priority heap (5 points)

Change this implementation of priority queue to a Min-priority queue (i.e. so that dequeue returns (and removes) the smallest element of the queue instead of the largest one.

Test your implementation carefully, submit all your test cases.

Problem 3: Dictionaries as Binary Search Trees (30 points)

For this problem you will need to implement an English/Spanish dictionary as two Binary Search Trees (to allow alphabetical search of each dictionary independently).

About Spanish alphabet. As it turns out, Spanish alphabet is not clearly defined. Most sources list 29 letters in the alphabet: 26 the same as in English alphabet, and the extra letters ch, ll, and ñ (if your browser doesn't show it, it's n with the ~ on top). Some also include rr as a separate letter. See a page at about.com and a page at allinfo-about.com for more details. However, according to this source, for the alphabetizing purpose the two-symbol letters ch and ll would be each considered as two separate letters. In other words, chorizo comes before comida in alphabetical order (if ch was a letter of its own, everything that starts with ch would be after everything that starts with just a c). Therefore, for our purposes we will use an alphabet of 27 letters, with the only additional letter ñ which comes right after n. For simplicity we will write this letter as ~, for instance "Al niño" will be written as "Al ni~o".

Our sample vocabulary comes from this Glossary of Spanish Food. We will only use words that can be translated by a single word.

Below is the file TestSpanish with the data for testing your code.



import java.util.*;

public class TestSpanish {
    public static void main(String [] args) {

	// the words are stored as pairs: an english word followed by 
	// its spanish translation.
	// ~ stands for ñ (n with tilde)
	String [] words = { "almonds", "almendras",  "anchovy", "anchoa",
			    "apple", "manzana", "artichoke", "alcachofa",
			    "asparagus", "esparrago", "bacon", "beicon",
			    "banana", "platano", "basil", "albahaca",
			    "batter", "albardilla", "bayleaf", "laurel",
			    "blackberry", "mora", "brandy", "co~ac", 
			    "bread", "pan", "butter", "mantequilla",
			    "cabbage", "col", "cake", "pastel", 
			    "carrot", "zanahoria", "celery", "apio",
			    "cheese", "queso", "chestnut", "casta~o",
			    "chicken", "pollo", "chocolate", "chocolate",
			    "cider", "sidra", "cinnamon", "canela", 
			    "clams", "almejas", "coconut", "coco",
			    "dill", "eneldo", "dogfish", "cazon", 
			    "dough", "masa", "duck", "pato", "egg", "huevo",
			    "eggplant", "berenjena", "endives", "endibias",
			    "fennel", "hinojo", "fish", "pescado", 
			    "flour", "harina", "garlic", "ajo", 
			    "gelatine", "gelatina", "ginger", "jengibre",
			    "goose", "ganso", "grape", "uva", 
			    "grapefruit", "pomelo", "guava", "guayaba",
			    "haddock", "abadejo", "ham", "jamon", 
			    "herb", "hierba", "herring", "arenque",
			    "honey", "miel", "jam", "mermelada",
			    "kidney", "ri~on", "lamb", "cordero",
			    "leeks", "puerros", "lemon", "limon",
			    "lentils", "lentejas", "lettuce", "lechuga",
			    "mint", "menta", "molasses", "melazaf", 
			    "mushrooms", "champi~ones", "mussels", "mejillones", 
			    "mustard", "mostaza", "nectarine", "nectarina",
			    "noodles", "tallarines", "onion", "cebolla",
			    "olive", "aceituna", "pancake", "crepe",
			    "paprika", "pimenton", "parsley", "perejil",
			    "partridge", "perdiz", "pasta", "pasta",
			    "pepper", "pimienta", "pizza", "pizza",
			    "pineapple", "pi~a", "plum", "ciruela",
			    "potatoes", "patatas", "prawns", "gambas",
			    "quail", "codorniz", "rabbit", "conejo", 
			    "radish", "rabano", "ribs", "costillas",
			    "rice", "arroz", "rosemary", "romero",
			    "saffron", "azafran", "sage", "salvia",
			    "salmon", "salmon", "salt", "sal", "sauce", "salsa",
			    "sherry", "jerez", "snail", "caracol", 
			    "spaghetti", "espaguetis", "squid", "calamar",
			    "thyme", "tomillo", "tomatoes", "tomates",
			    "trout", "trucha", "tuna", "atun", 
			    "vinegar", "vinagre", "watercress", "berro",
			    "yolks", "yemas"
	};

	int numberOfSwaps = 50;

	// Randomize the data: generate two indices at random 
	// between 0 and length/2 and switch the pairs
	// Repeat this numberOfSwaps times.
	
	
	// printing the words, each pair on one line
	// tabs are for formatting
	for (int i = 0; i < words.length/2; ++i) {
	    String tabs = "\t";
	    if (words[2*i].length() < 8) {
		tabs = "\t\t";
	    }
	    System.out.println(words[2*i] + tabs + words[2*i + 1]);
	}

	// create a Spanish-English dictionary
	// and English-Spanish dictionary
	
	// go through the array, inserting an english word
	// into the English-Spanish dictionary,
	// and inserting the spanish word into the 
	// Spanish-English dictionary.
	// Set the cross-references between the two words
	// using setTranslation()

	// print both dictionaries using inOrder traversal

	// print the height of both dictionaries

	// test the search method of both dictionaries

    }
}

For this program you need to write the following:

Part 1. Randomizing the data

In the file TestSpanish.java write a loop to switch the order of pairs (an english word and its spanish translation) in the array randomly. Randomly generate a pair of integers between 0 and length/2 ('length' is the length of the array). Then switch the corresponding pairs. For instance, if the random numbers are 1 and 3, the pair number 1 will be switched with the pair number 3 (assuming that the first pair has a number 0). If this was the first switch in the array, the array will have the following order of elements after the switch:

{"almonds", "almendras", "artichoke", "alcachofa",  
"apple", "manzana", "anchovy", "anchoa", 
"asparagus", "esparrago", "bacon", "beicon", //other elements unchanged
The loop should repeat this procedure the number of times specified in the variable numberOfSwaps.

Part 2. Writing the nodes methods

There will be two node classes in this program: SpanishWordNode and EnglishWordNode. Each node has a String that stores the word, the references to the left and to the right nodes following this node in the tree, and the references to the node in the other tree that has the translation of the word.

Each node class also has a comparison method. The method compares the word in the node to another string. The returned value is computed similar to the value of compareTo() method: it's negative if the word in the node comes before the parameter string, a 0 if they are equal, and a positive integer if the parameter string comes before the word in the node. The comparison method of the SpanishWordNode class compares the two strings according to the spanish alphabet. You don't need to understand the details of the method.

Each comparison method has a static counter. There is a static printCounter() method in each Node class and a reset() method for the counter. Below is the code of the two node classes:



public class EnglishWordNode {
    private String word;
    private SpanishWordNode translation;
    private EnglishWordNode left;
    private EnglishWordNode right;
    private static int comparisonCounter = 0;

    public int compareEnglish(String theOtherWord) {
	comparisonCounter++;
	return getWord().compareToIgnoreCase(theOtherWord);
    }

    public static void printCounter() {
	System.out.println("English dictionary: " + comparisonCounter +
			   " comparisons");
    }
    
    public static void resetCounter() {
	comparisonCounter = 0;
    }
}


import java.text.*;

public class SpanishWordNode {
    private String word;
    private EnglishWordNode translation;
    private SpanishWordNode left;
    private SpanishWordNode right;
    private static int comparisonCounter = 0;

    public int compareSpanish(String theOtherWord) {
	comparisonCounter++;
	String spanishRules =
	    ("< a,A < b,B < c,C " +
	     "< d,D < e,E < f,F " +
	     "< g,G < h,H < i,I < j,J < k,K < l,L " +
	     "< m,M < n,N " +
	     "< '~' " +
	     "< o,O < p,P < q,Q < r,R " +
	     "< s,S < t,T < u,U < v,V < w,W < x,X " +
	     "< y,Y < z,Z");
	try {
	    RuleBasedCollator spanishCollator = new 
		RuleBasedCollator(spanishRules);
	    return spanishCollator.compare(getWord(),theOtherWord);   
	} catch (ParseException e){
	    System.out.println(e);
	    System.exit(0);
	    return 0;
	}
    }

    public static void printCounter() {
	System.out.println("Spanish dictionary: " + comparisonCounter +
			   " comparisons");
    }

    public static void resetCounter() {
	comparisonCounter = 0;
    }
}

Your task is to write the following methods:

EnglishWordNode class:

The methods for the SpanishWordNode are the same, the set/get methods for translation operate on EnglishWordNode.

Part 3: writing the dictionary classes

Write classes EnglishDictionary and SpanishDictionary. Each class is a binary search tree. The EnglishDictionary uses the EnglishWordNode class, and the SpanishDictionary uses the SpanishWordNode class. You will need to write the following methods for EnglishDictionary:

Part 4: writing recursive methods for the dictionary classes

Write 4 recursive methods: to print the dictionaries in InOrder, PreOrder, and PostOrder and to compute the height of the trees. Note that all methods require recursive methods in the node classes.

Part 5: inserting the dictionary words

In the file TestSpanish.java write code to insert the english words into the EnglishDictionary and the spanish words into the SpanishDictionary. Make sure to set the translation for the words. The words are already randomized, so you can insert them in the order in which they appear in the array.

Test your code by printing out the two dictionaries using InOrder() method, they should appear alphabetically. Print out the heights of both dictionaries. Test the search method by searching for a word and printing out its translation for several test examples (for english-to-spanish and spanish-to-english translation).

Print out the number of comparisons for each dictionary (before the searches).

Part 6: how do the heights of the trees and the number of comparisons depend on the number of swaps?

How do the heights of the trees depend on the number of swaps performed in the first loop? What is the number of comparisons needed to insert the words into each of the dictionaries for different number of swaps.

Please test your program for 25, 50, 100, and 150 swaps. You need to perform several runs in each case. Explain the results. How many swaps would you recommend to get good heights of both trees?

Testing note: while you are debugging your program, it might be helpful to use the same order of words on every run. Putting an integer parameter in the constructor for the random number generator allows you to fix the "random" numbers: Random r = new Random(20);

Changing the number changes the sequence to another fixed one. Removing it makes the sequence different every time you run the program.

That's all!!!

This is a problem set from CSci 2101 course.