Assignment 1 of CS4412A/9555A, 2011
Due date: Oct 11, 2011 (midnight)
Submission: Email your assignment (Word, PDF, PS) to
cs4412.uwo@gmail.com
Individual effort (no group work)
Total marks: 10% of the final marks
Questions 1-3 will get yourself familiar with the working of
some most popular classifiers and with WEKA.
Knowing how these popular algorithms work (such as Question 1)
is expected in the midterm.
1.
Download and install WEKA on your PC. When installing WEKA,
several datasets will also be installed on your hard disk, under
C:\Program Files\Weka\data. You will see a dataset for contact lenses
(contact-lenses.arff).
With the help of a calculator, manually apply 1R, C4.5 (using gain ratio but no
pruning), and naive Bayes, on the first 18 examples of
this dataset (the last 6 examples will be used for testing; see later).
Show enough details of how
the algorithms work. (For 1R, show the resulting tree;
for C4.5, show how the root of the tree is chosen;
for naive Bayes, show how the probability of the the first test example
(or 19th example) is calculated).
2. Apply WEKA's algorithms on the same training dataset
(the first 18 examples).
In WEKA,
1R is OneR under classifiers/rules, C4.5 is j48 under
classifiers/trees, and naive Bayes is NaiveBayes under
classifiers/bayes.
Compare with your results in 1.
3. Use WEKA to apply the three classifiers (obtained from WEKA)
to predict the last 6 in the original dataset.
What is the accuracy of each classifier?
Which one is best? Explain.