The Sorting Competition is a multi-lab exercise on choosing the fastest sorting algorithm for a given type of data. By "fast" we mean the actual running time and not the Big-Theta approximation. You will get description of the data and a sample input file to practice, but the final test will be done on a different file.
A portion of your grade will depend on your place in the competition. Three groups that get the best time in the final run will get extra credit (the amount depending on the place).
The assignment will have two lab periods: one to get preliminary competition results and the second one for the final competition. In addition there will be a correctness analysis assignmment, a presentation, and a final write-up.
Make sure to make your Eclipse workspace read-protected to keep your "trade secrets" secret.
The data generator and the sample (not particularly inefficient) sorting are available at https://github.com/elenam/SortingCompetitionMaterials2014. The best way to study properties of data and to understand the sorting criterion is by experimenting with the data generator and studying the comparison method in the comparator.
The data consists of strings of 0s and 1s that are generated as follows:
Your goal is to develop and implement an algorithm that sorts these data by the following:
For instance, the following strings appear in a sorted order:
001110 100110 011101 00010001 01000100 11110100 10111111 011111001 111110110 0110010100 1100100110 0101111011 100100100000 000101110000 001010100010 100001001010 100100010010 001001010101 101000011100 101100000011 110100100010 001101011001 010111000011 011001111000 011100100101 101011001001 101111000100 110010111000 111100001100 001110101101 100100111011 100100111110 100011110111 110111110011 111111011100
The parameter for the Poisson distribution (lambda) is between 0 (exclusive) and 8 (inclusive), and will be supplied with data. Note that lambda is a double, not an integer. You might want to study how the data changes with different values of lambda.
Given that sorting takes a parameter, at every round of the competition there will be three runs (with three different lambdas), and your place will be determined by the sum of your places in all three runs. If the sum turns out to be equal for two groups, the sum of medians of the run times for each lambda will be the deciding factor.
diff
command to check if your resulting
ordering is the same as the one given by the sample sorting.
Every group will receive a number and should name their main
file GroupN
, where N is the number that you got. All your
supllemental files can have any name. Please use the default package
and assume that the input and output files are at the root of the
project, so no file path is needed. Please make sure that your program
takes the right command line arguments in the right order.
The first line of the input file is the value of lambda, for example
2.5
.
Please submit all your Java files to me (CC your group) by email. You must have authors' names in every file and there should be comments clarifying what you are doing, but no other documentation is required for this round.
After the sorting programs trial run, continue working on your sorting program. For the final submission include, in addition to your Java files, a write-up that describes your sorting algorithm. Discuss its efficiency both in theory (the Big-O) and in practice: effects of programming choices, such as an array versus ArrayList, using specific Java methods, etc.
Use the same file name as you used in the first round. Make sure to include authors' names. Your code will be posted at this point.
If you changed anything in your program after the initial run, please briefly indicate what you changed and why. You also might want to include things that you tried that didn't work out.
Every group will be assigned a group to review. Your goal is to verify that they followed all of the rules. If any of the rules were broken, you should also assess if this has given the group unfair advantage in the sorting time. If all the rules were followed, please briefly justify your finding, for instance: "the resulting array is sorted because the numbers were stored as integers and Java-provided quicksort was applied to sort the integers".
Please upload your review to google docs and share it with me and to all members of the team you are reviewing.
Correctness analysis: for the program given to you please verify that all the rules have been followed, including the following:
Additionally you may analyze a sorting program from any group other than the one that the one you are assigned. This will not give you extra credit, unless you were the first to discover a problem that the assigned reviewing group missed. This part can be done individually or in groups.
If the group that was analysing your algorithm found any issues, you have an opportunity to respond to them (by e-mail, CC me) if you disagree with their findings. I will be happy to discuss this with you. Note that if you don't send a response before the presentation time then I assume that you agree with their findings.
You need to prepare a 3-5 minute presentation about your algorithm. Prepare 3-5 slides about your algorithm (PDF or OpenOffice so that you can display them in the lab).
Your presenation has to address:
Keep in mind that lists and tables work better for slides than long sentences.
Please take notes when listening to the presentation; you will need those for the final write-up.
The final write-up has two parts: the description of your algorithm (if you presented on another group's algorithm then of their algorithm), and a summary of what algorithms overall turned out to be the most effective for this problem and why.
Each of the two parts should be about a page long.
The algorithm description should follow the guidelines for the presentation, only be more detailed. Examples (of data storage, etc.) help.
The algorithm comparison should list all approaches that were used for the problem by all teams (e.g. "three groups used variations of radix sort") and their comparative effectiveness. Note that data represntation plays a significant part in efficiency. What have you learned through the competition?