CSci 2101 Data Structures: Problem set 3

Due Wednesday, October 27th at 8pm.

Problem 1: Random sentences (5 points)

You are given the following file:

import java.util.*;

public class RandomSentences {
    public static void main(String[] args) {
	String [] articles = {"a","the","my","your"};
	String [] adjectives = {"playful","red"};
	String [] nouns = {"dog","frog","bug"};
        String [] actions = {"is walking","is jumping","is flying by"};
        
        // write code to construct random sentences out of
        // the words above
        
    }
}

Write code to construct random sentences of the form "article adjective noun action", for instance "a playful dog is walking". Write a loop to generate 20 sentences. Check that each of the words is used at least once.

You will need to use a random number generator. Make sure to write your code in such a way that if you add or delete some of the words in the arrays, your program should still work correctly.

Problem 2: Common strings (10 points)

You are given two arrays of strings. Write a program that creates an array which contains all the common strings in the two arrays, i.e. strings that appear in both arrays. You also need to print out the number of such strings. Important: If a string appears more than once in one of the arrays, it should be included only once in the resulting set. If you think of arrays as sets, you need to find their intersection.

The comparison is case-sensitive. The two given arrays must remain unchanged. Since you don't know in advance how many common strings the two arrays have, you need to create the new array the size of the shorter of the two given arrays (why do you need at least that many? and why is this enough?). At the end you need to find the number of non-null elements of the new array.

What is the efficiency of your algorithm?

Here is some code to get you started. The common strings in this example are "I","do","not","like".


public class CommonStrings {
	
	public static void main(String[] args) {
	    String [] array1 = {"I","do","not","like","green","eggs","and",
				 "ham"};
	    String [] array2 = {"I","do","not","like","it","Sam","I","am"};
	    
	}
}

Problem 3: Dictionaries as Binary Search Trees (30 points)

For this problem you will need to implement an English/Spanish dictionary as two Binary Search Trees (to allow alphabetical search of each dictionary independently).

About Spanish alphabet. As it turns out, Spanish alphabet is not clearly defined. Most sources list 29 letters in the alphabet: 26 the same as in English alphabet, and the extra letters ch, ll, and ñ (if your browser doesn't show it, it's n with the ~ on top). Some also include rr as a separate letter. See a page at about.com and a page at allinfo-about.com for more details. However, according to this source, for the alphabetizing purpose the two-symbol letters ch and ll would be each considered as two separate letters. In other words, chorizo comes before comida in alphabetical order (if ch was a letter of its own, everything that starts with ch would be after everything that starts with just a c). Therefore, for our purposes we will use an alphabet of 27 letters, with the only additional letter ñ which comes right after n. For simplicity we will write this letter as ~, for instance "Al niño" will be written as "Al ni~o".

Our sample vocabulary comes from this Glossary of Spanish Food. We will only use words that can be translated by a single word.

Below is the file TestSpanish with the data for testing your code.



import java.util.*;

public class TestSpanish {
    public static void main(String [] args) {

	// the words are stored as pairs: an english word followed by 
	// its spanish translation.
	// ~ stands for ñ (n with tilde)
	String [] words = { "almonds", "almendras",  "anchovy", "anchoa",
			    "apple", "manzana", "artichoke", "alcachofa",
			    "asparagus", "esparrago", "bacon", "beicon",
			    "banana", "platano", "basil", "albahaca",
			    "batter", "albardilla", "bayleaf", "laurel",
			    "blackberry", "mora", "brandy", "co~ac", 
			    "bread", "pan", "butter", "mantequilla",
			    "cabbage", "col", "cake", "pastel", 
			    "carrot", "zanahoria", "celery", "apio",
			    "cheese", "queso", "chestnut", "casta~o",
			    "chicken", "pollo", "chocolate", "chocolate",
			    "cider", "sidra", "cinnamon", "canela", 
			    "clams", "almejas", "coconut", "coco",
			    "dill", "eneldo", "dogfish", "cazon", 
			    "dough", "masa", "duck", "pato", "egg", "huevo",
			    "eggplant", "berenjena", "endives", "endibias",
			    "fennel", "hinojo", "fish", "pescado", 
			    "flour", "harina", "garlic", "ajo", 
			    "gelatine", "gelatina", "ginger", "jengibre",
			    "goose", "ganso", "grape", "uva", 
			    "grapefruit", "pomelo", "guava", "guayaba",
			    "haddock", "abadejo", "ham", "jamon", 
			    "herb", "hierba", "herring", "arenque",
			    "honey", "miel", "jam", "mermelada",
			    "kidney", "ri~on", "lamb", "cordero",
			    "leeks", "puerros", "lemon", "limon",
			    "lentils", "lentejas", "lettuce", "lechuga",
			    "mint", "menta", "molasses", "melazaf", 
			    "mushrooms", "champi~ones", "mussels", "mejillones", 
			    "mustard", "mostaza", "nectarine", "nectarina",
			    "noodles", "tallarines", "onion", "cebolla",
			    "olive", "aceituna", "pancake", "crepe",
			    "paprika", "pimenton", "parsley", "perejil",
			    "partridge", "perdiz", "pasta", "pasta",
			    "pepper", "pimienta", "pizza", "pizza",
			    "pineapple", "pi~a", "plum", "ciruela",
			    "potatoes", "patatas", "prawns", "gambas",
			    "quail", "codorniz", "rabbit", "conejo", 
			    "radish", "rabano", "ribs", "costillas",
			    "rice", "arroz", "rosemary", "romero",
			    "saffron", "azafran", "sage", "salvia",
			    "salmon", "salmon", "salt", "sal", "sauce", "salsa",
			    "sherry", "jerez", "snail", "caracol", 
			    "spaghetti", "espaguetis", "squid", "calamar",
			    "thyme", "tomillo", "tomatoes", "tomates",
			    "trout", "trucha", "tuna", "atun", 
			    "vinegar", "vinagre", "watercress", "berro",
			    "yolks", "yemas"
	};

	// Randomize the data: generate two indices at random 
	// between 0 and length/2 and switch the pairs
	// Repeat this 100 times.
	
	
	// printing the words, each pair on one line
	// tabs are for formatting
	for (int i = 0; i < words.length/2; ++i) {
	    String tabs = "\t";
	    if (words[2*i].length() < 8) {
		tabs = "\t\t";
	    }
	    System.out.println(words[2*i] + tabs + words[2*i + 1]);
	}

	// create a Spanish-English dictionary
	// and English-Spanish dictionary
	
	// go through the array, inserting an english word
	// into the English-Spanish dictionary,
	// and inserting the spanish word into the 
	// Spanish-English dictionary.
	// Set the cross-references between the two words
	// using setTranslation()

	// print both dictionaries using inOrder traversal

	// print the height of both dictionaries

	// test the search method of both dictionaries

	// Part 6 of the problem: create another two dictionaries
	// setting the root to the middle of the first three elements.
	// Note that the middle element will be different for 
	// the Spanish and the English dictionaries. Make sure to set
	// the translation right for the three first elements. 

	// insert the rest of the elements into the dictionaries

	// test the dictionaries by printing them out and
	// by searching for a translation.

	// print out the height of the dictionaries,
	// compare them to the heights of the first two dictionaries
    }
}

For this program you need to write the following:

Part 1. Randomizing the data

In the file TestSpanish.java write a loop to switch the order of pairs (an english word and its spanish translation) in the array randomly. Randomly generate a pair of integers between 0 and length/2 ('length' is the length of the array). Then switch the corresponding pairs. For instance, if the random numbers are 1 and 3, the pair number 1 will be switched with the pair number 3 (assuming that the first pair has a number 0). If this was the first switch in the array, the array will have the following order of elements after the switch:

{"almonds", "almendras", "artichoke", "alcachofa",  
"apple", "manzana", "anchovy", "anchoa", 
"asparagus", "esparrago", "bacon", "beicon", //other elements unchanged
The loop should repeat this procedure 100 times to guarantee that the order is random.

Part 2. Writing the nodes methods

On Wednesday we will go over some examples that will be helpful for this and further parts of the problem.

There will be two node classes in this program: SpanishWordNode and EnglishWordNode. Each node has a String that stores the word, the references to the left and to the right nodes following this node in the tree, and the references to the node in the other tree that has the translation of the word.

Each node class also has a comparison method. The method compares the word in the node to another string. The returned value is computed similar to the value of compareTo() method: it's negative if the word in the node comes before the parameter string, a 0 if they are equal, and a positive integer if the parameter string comes before the word in the node. The comparison method of the SpanishWordNode class compares the two strings according to the spanish alphabet. You don't need to understand the details of the method.

Each comparison method has a static counter. There is a static printCounter() method in each Node class and a reset() method for the counter. Below is the code of the two node classes:



public class EnglishWordNode {
    private String word;
    private SpanishWordNode translation;
    private EnglishWordNode left;
    private EnglishWordNode right;
    private static int comparisonCounter = 0;

    public int compareEnglish(String theOtherWord) {
	comparisonCounter++;
	return getWord().compareToIgnoreCase(theOtherWord);
    }

    public static void printCounter() {
	System.out.println("English dictionary: " + comparisonCounter +
			   " comparisons");
    }
    
    public static void resetCounter() {
	comparisonCounter = 0;
    }
}


import java.text.*;

public class SpanishWordNode {
    private String word;
    private EnglishWordNode translation;
    private SpanishWordNode left;
    private SpanishWordNode right;
    private static int comparisonCounter = 0;

    public int compareSpanish(String theOtherWord) {
	comparisonCounter++;
	String spanishRules =
	    ("< a,A < b,B < c,C " +
	     "< d,D < e,E < f,F " +
	     "< g,G < h,H < i,I < j,J < k,K < l,L " +
	     "< m,M < n,N " +
	     "< '~' " +
	     "< o,O < p,P < q,Q < r,R " +
	     "< s,S < t,T < u,U < v,V < w,W < x,X " +
	     "< y,Y < z,Z");
	try {
	    RuleBasedCollator spanishCollator = new 
		RuleBasedCollator(spanishRules);
	    return spanishCollator.compare(getWord(),theOtherWord);   
	} catch (ParseException e){
	    System.out.println(e);
	    System.exit(0);
	    return 0;
	}
    }

    public static void printCounter() {
	System.out.println("Spanish dictionary: " + comparisonCounter +
			   " comparisons");
    }

    public static void resetCounter() {
	comparisonCounter = 0;
    }
}

Your task is to write the following methods:

EnglishWordNode class:

The methods for the SpanishWordNode are the same, the set/get methods for translation operate on EnglishWordNode.

Part 3: writing the dictionary classes

Write classes EnglishDictionary and SpanishDictionary. Each class is a binary search tree. The EnglishDictionary uses the EnglishWordNode class, and the SpanishDictionary uses the SpanishWordNode class. You will need to write the following methods for EnglishDictionary:

Part 4: writing recursive methods for the dictionary classes

Write 4 recursive methods: to print the dictionaries in InOrder, PreOrder, and PostOrder and to compute the height of the trees. Note that all methods require recursive methods in the node classes.

Part 5: inserting the dictionary words

In the file TestSpanish.java write code to insert the english words into the EnglishDictionary and the spanish words into the SpanishDictionary. Make sure to set the translation for the words. The words are already randomized, so you can insert them in the order in which they appear in the array.

Test your code by printing out the two dictionaries using InOrder() method, they should appear alphabetically. Print out the heights of both dictionaries. Test the search method by searching for a word and printing out its translation for several test examples (for english-to-spanish and spanish-to-english translation).

Print out the number of comparisons for each dictionary (before the searches).

Part 6: a better choice of the root node

Create another two dictionaries and choose the roots as the middle of the first three elements. After inserting the first three elements insert the rest of them as they appear in the array. Reset the comparison counter (using the reset() method of the nodes classes).

Important: the middle spanish word among the first three can be different from the middle one among the first three english ones. Make sure to set the translation right for the first three nodes.

Print out the trees to check the ordering. Print out the height of the tree and the number of comparisons.

Part 7: Comparing the two approaches to creating a dictionary

Run the program several times to get statistics on the height of the dictionary and the numbers of comparisons.

Compare the dictionaries created using the first element as the root to dictionaries where the root is chosen as the middle one among the first three. Compare the heights and the numbers of comparisons. Which one is more efficient?

Testing note: while you are debugging your program, it might be helpful to use the same order of words on every run. Putting an integer parameter in the constructor for the random number generator allows you to fix the "random" numbers: Random r = new Random(20);

Changing the number changes the sequence to another fixed one. Removing it makes the sequence different every time you run the program.

That's all!!!

This is a problem set from CSci 2101 course.