Assignment 2 of CS4412A/9555A, 2011

Due date: Oct 25 (midnight)
Submission: electronic submission ONLY
Individual effort (no group work)
Total marks: 10% of the final marks

1. Use WEKA to compare the predictive accuracy of j48, NaiveBayes, and PRISM on 5 large (more than 1,000 examples) datasets. They can be any datasets coming with WEKA, but at least 2 datasets must be from the UCI Machine Learning Repository, which contains over two hundreds real-world datasets, and can be downloaded here. Apply k-fold cross validations (you choose k) for the comparison. Use t-tests to draw reliable conclusions on every pair of algorithms. For example, your conclusion can be that between j48 and PRISM, j48 wins on 3 datasets, ties on 1, and loses on 1 (wins/loses means significantly better/worse). Analyze your results and draw useful conclusions about how to choose algorithms. Show enough details for us to know that you are doing the right thing. (It is actually easy to get win/lose/draw results if you use WEKA's Experimenter to compare different learning algorithms. It automatically performs the t-test and show you the results).

2. Apply WEKA's APRIORI (association rule mining) algorithm on a large dataset (can be a dataset used in 1).
a. Try several different choices of the support and confidence, and analyze the results obtained.
b. Start with a high support, and gradually reduce it. Find out 20 association rules with the highest support, and a confidence of at least 75%.