Assignment 2 of CS4412A/9555A, 2012
Due date: Oct 29, 2012 (midnight)
Submission: Email your assignment (Word, PDF, PS) to cs4412.western@gmail.com
Individual effort (no group work)
Total marks: 10% of the final marks

1. Take the first 18 examples of the "contact-lenses.arff" as the training data, and the rest 6 examples as the test data (as in assignment 1). With the help of a calculator, manually apply naive Bayes algorithm. Show how the probability of the 1st test example (i.e., the 19th example in the entire data set) is calculated.


2. Use WEKA's Explorer to compare 3 regression algorithms on 3 large (more than 500 examples) data sets.

Linear regression, under classifiers/functions/LinearRegression
Neural networks, under Classify/function/MultilayerPerceptron
k-NN, under Classify/lazy/IBK

(1) You can either keep the default parameters or set your own parameters for the algorithms.
(2) You can use any data sets from the “Datasets” section on WEKA website. Or you can directly download the raw data from UCI Machine Learning Repository (data format might have to be converted for WEKA). Note that the data sets should be specifically for regression tasks (i.e., the class value is numerical instead of binary or discrete)
(3) Apply k-fold cross validations (you choose k) for the comparison.
(4) Use "Root mean squared error" (RMSE) as the metric for comparison.

Show your results, and compare the total error (according to RMSE) and speed of the algorithms (according to your own estimation).


3. Use WEKA's Experimenter to compare the classification predictive accuracy of j48, NaiveBayes, SMO on 5 large (more than 1,000 examples) data sets.

(1) You can either keep the default parameters or set your own parameters for the algorithms.
(2) You can use any data sets from the “Datasets” section on WEKA website. Or you can directly download the raw data from UCI Machine Learning Repository (data format might have to be converted for WEKA).
(3) Apply k-fold cross validations (you choose k) for the comparison.
(4) Use paired t-test with significance 0.05.
    
Take a screenshot of the results from WEKA's Experimenter. Analyze your results to compare every pair of algorithms. For example, one of your conclusions can be that between j48 and NaiveBayes, j48 wins on 3 datasets, ties on 1, and loses on 1 (wins/loses means significantly better/worse).