Assignment 1 of CS4412A/9555A, 2012
Due date: Oct 9, 2012 (midnight)
Submission: Email your assignment (Word, PDF, PS) to firstname.lastname@example.org
Individual effort (no group work)
Total marks: 10% of the final marks
Download and install WEKA (stable book 3rd ed. version, weka3-6-8) on
your computer. After the installation, several datasets will also be
installed in your installation folder. You will see a dataset for
contact lenses (contact-lenses.arff).
With the help of a calculator, manually apply decision tree algorithm
(using gain ratio) on the first 18 examples of this dataset. Show
enough details of how the root of the tree is chosen.
- b. Build
a decision tree model in WEKA (using j48 under classifiers/trees) based
on the same training dataset (the first 18 examples). Compare with your
decision tree model in a.
- c. Apply the J48 model you built in b to the last 6 examples in the original dataset. What is the accuracy of the model?
In WEKA, ID3 (version weka3-6-8, under classifier/trees) is a
simplified version of decision tree algorithm. You are required to
check and modify the source code of ID3. Specifically, after installing
WEKA on your computer, you should be able to find the java source code
"weka-src.jar" in the installation folder. (Again, please use stable
book 3rd ed. version, weka3-6-8.) Find the source code for the ID3
- a. According to the source code, what criterion does ID3 use to split the tree (info gain, or gain ratio)?
If info gain is used, change it to gain ratio; if gain ratio is used,
change it to info gain. Name the modified source code file
id3_new.java, and submit it together with your report.