CSci 3501 Algorithms and Computability - Sorting competition.

About the sorting competition

The Sorting Competition is a multi-lab exercise on choosing the fastest sorting algorithm for a given type of data. By "fast" we mean the actual running time and not the Big-Theta approximation. You will get description of the data and a sample input file to practice, but the final test will be done on a different file.

A portion of your grade will depend on your place in the competition. Three groups that get the best time in the final run will get extra credit (the amount depending on the place).

The assignment will have two lab periods: one to get preliminary competition results and the second one for the final competition. In addition there will be a correctness analysis assignmment, a presentation, and a final write-up.

Make sure to make your Eclipse workspace read-protected to keep your "trade secrets" secret.

Data and sorting criteria

The idea for this data and sorting criterion has been suggested by Lucas Ellgren.

Data for this program consists of real numbers between 0 (inclusive) and 1 (exclusive). One third of the data is uniformly distributed in the range.

Another third of the data forms several clusters of numbers such that the numbers in each cluster are all within .0001 of each other. Each cluster is approximately 3% of the total array. Clusters are uniformly distributed over the original range.

The final third of the data forms clusters of numbers within .00000001 of each other. Once again, each cluster is 3% of the total and are uniformly distributed in the range.

After the data is generated it is shuffled.

The goal is to sort the data in the non-decreasing order.

Below is a sample of (sorted) data, although a real-example data would be at least few thousands elements:


0.00611718 
0.20771484 
0.33271706 
0.41008081 
0.62512620 
0.62512621 
0.62512622 
0.62512623 
0.62512624 
0.62512625 
0.73087819 
0.85592000 
0.85592001 
0.85592002 
0.85592003 
0.85592004 
0.85592005 
0.93986539 
0.96370480 
0.96775591 

Rules and requirements

  1. You must use Java. While this may not be the best language choice for efficiency purposes, this assignment is not about languages, it's about algorithms. Thinking about implementation details is important, however.
  2. The file will contain numbers between 0 and 1 with 8 digits after the decimal point (e.g. 0.00611787) separated by whitespace characters (white space, new line, etc).
  3. Your program must read data from a file, store it internally in a data structure of some kind, sort it, and then output it into a file. This link may be helpful for file reading/writing: http://www.javapractices.com/topic/TopicAction.do?Id=42
  4. The program must take three command-line arguments, in this order:
  5. The timing in the program is done similarly to lab 1 (using System.currentTimeMillis()). The timer starts after you read the file and stops before you start writing out the result. You might want to stop the current thread for a few milliseconds after reading the file before you start the timer to allow time to close the buffer reader.
  6. Your program will consist of the following three parts:
    1. Reading data and preprocessing. Read the data from the file given as the first argument. Store the data in a structure of your choice (such as an array or an array list; your structure may not incorporate any ordering). The nubmer of elements is not known ahead of time.
      At this point you are allowed to collect information about the data as you are reading it. However, all such information must be stored in a constant space (a few variables and/or an array of a constant size).
    2. Copy-and-sort loop. The purpose of the loop is so that we can better distinguish closer sorting times by repeating the sorting multiple times and taking the total. It also reduces interference from interactions with the operating system, such as file closing and a call to the timer. The number of times the loop runs is one of the command-line arguments of the program. The loop must be as following:
      • You must start the timer before the first itertation of the loop and stop it right after the last one. Do not restart the timer for every iteration since this leads to waiting for the OS and therefore less accurate time readings.
      • Inside the loop you first copy the data from the original container into a new one (which may or may not be the same type as the original one; it may not have any ordering).
      • Then you sort the data using any methods you would like (your own or pre-defined). At the end of the loop the data must be in the container that it was copied into in the beginning of the loop and it must be sorted.
    3. Writing output After the timer is stopped, write out the sorted data from the last loop iteration into the output file that was specified as the command-line argument. The format of output must be the same as the input (8 decimal places). You might find the following class helpful:
      
      DecimalFormat formatter = new DecimalFormat("0.00000000");
      ....
      System.out.println(formatter.format(value));
      
  7. You should have a method that takes a file and checks that all the elements are sorted. Also make sure that the number of elements in the input file is the same as in the output. Additionally you should check that they have the same elements (either by checking for randomly selecting elements, or, better yet, automatically).

Round 1: what to submit.

Please submit all your Java files to em (CC your group) by email. You must have authors' names in every file and there should be comments clarifying what you are doing, but no other documentation is required for this round.

Your files will be renamed (to anonymize the test runs) and ran on a different data file than the one you were using for practice. Timing results will be posted. You will only be given your own group number so that you know how you did, but don't know whom the other results belong to.

Results of round 1

Round 2: what to submit.

After the sorting programs trial run, continue working on your sorting program. For the final submission include, in addition to your Java files, a write-up that describes your sorting algorithm. Discuss its efficiency both in theory (the Big-O) and in practice: effects of programming choices, such as an array versus ArrayList, using specific Java methods, etc.

If you changed anything in your program after the initial run, please briefly indicate what you changed and why. You also might want to include things that you tried that didn't work out.

Results of round 2
Results of round 2 with 15 loops (JVM warmup)

Correctness Analysis.

Every person will be assigned a group to review. Your goal is to verify that they followed all of the rules. If any of the rules were broken, you should also assess if this has given the group unfair advantage in the sorting time. If all the rules were followed, please briefly justify your finding, for instance: "the resulting array is sorted because the numbers were stored as integers and Java-provided quicksort was applied to sort the integers".

Please send your review to me and to all members of the team you are reviewing. If you don't know the other team members' e-mail addresses, please send your submission just to me, and I will forward it. Please make it clear in the subject that it needs to be forwarded.

Correctness analysis: for the program given to you please verify that all the rules have been followed, including the following:

Additionally you may analyze a sorting program from any group other than the one that the one you are assigned. This will not give you extra credit, unless you were the first to discover a problem that the assigned reviewing group missed. This part can be done individually or in groups.

Correctness Analysis Response.

If the group that was analysing your algorithm found any issues, you have an opportunity to respond to them (by e-mail, CC me) if you disagree with their findings. I will be happy to discuss this with you. Note that if you don't send a response before Thursday (the presentation time) then I assume that you agree with their findings.

Presentation.

Please upload your slides to the Wiki

You need to prepare a 3-5 minute presentation about your algorithm. Prepare 3-5 slides about your algorithm (PDF or OpenOffice so that you can display them in the lab). If you two group members are preparing the presenatation, both have to be presenting. Know who is saying what and practice your presentation at least once. Be prepared to answer questions.

If you are presenting on someone else's algorithm, please answer all of the questions for their algorithm.

Your presenation has to address:

Keep in mind that lists and tables work better for slides than long sentences.

Please take notes when listening to the presentation; you will need those for the final write-up.

Final Write-Up.

The final write-up has two parts: the description of your algorithm (if you presented on another group's algorithm then of their algorithm), and a summary of what algorithms overall turned out to be the most effective for this problem and why.

Each of the two parts should be about a page long.

The algorithm description should follow the guidelines for the presentation, only be more detailed. Examples (of data storage, etc.) help.

The algorithm comparison should list all approaches that were used for the problem by all teams (e.g. "three groups used variations of radix sort") and their comparative effectiveness. Note that data represntation plays a significant part in efficiency. What have you learned through the competition?