Assignment 1 of CS4412A/9555A, 2012
Due date: Oct 9, 2012 (midnight)
Submission: Email your assignment (Word, PDF, PS) to cs4412.western@gmail.com
Individual effort (no group work)
Total marks: 10% of the final marks

1. Download and install WEKA (stable book 3rd ed. version, weka3-6-8) on your computer. After the installation, several datasets will also be installed in your installation folder. You will see a dataset for contact lenses (contact-lenses.arff).
a. With the help of a calculator, manually apply decision tree algorithm (using gain ratio) on the first 18 examples of this dataset. Show enough details of how the root of the tree is chosen.
b. Build a decision tree model in WEKA (using j48 under classifiers/trees) based on the same training dataset (the first 18 examples). Compare with your decision tree model in a.
c. Apply the J48 model you built in b to the last 6 examples in the original dataset. What is the accuracy of the model?

2. In WEKA, ID3 (version weka3-6-8, under classifier/trees) is a simplified version of decision tree algorithm. You are required to check and modify the source code of ID3. Specifically, after installing WEKA on your computer, you should be able to find the java source code "weka-src.jar" in the installation folder. (Again, please use stable book 3rd ed. version, weka3-6-8.) Find the source code for the ID3 algorithm (id3.java).
a. According to the source code, what criterion does ID3 use to split the tree (info gain, or gain ratio)?
b. If info gain is used, change it to gain ratio; if gain ratio is used, change it to info gain. Name the modified source code file id3_new.java, and submit it together with your report.