Assignment 3 of CS4412A/9555A, 2012
Due date: Nov 26, 2012 (midnight)
Submission: Email your assignment (Word, PDF, PS) to
Individual effort (no group work)
Total marks: 10% of the final marks

1. Generate one synthetic 2D datasets (with at least 200 data points) that you think K-means clustering algorithm would work; generate another 2D datasets (with at least 200 data points) that you think DBScan clustering algorithm would work. Use simpleKMeans and DBScan in Weka to cluster each of the two datasets and visualize the results. Try different parameters (and seeds) for the algorithms. Based on the results and your original hypothesis, please point out whether the results are consistent with your hypothesis, and try to analyze the reason why the algorithms fail to work or works well.

Note: To generate the synthetic datasets, you can use any tools or write a program in whatever language you prefer. You can also use the "User Define" option on the online DBScan Demo to generate the data manually.

Please submit the files containing the two synthetic datasets, visualization of your clustering results in Weka, and the analysis of the results.

2. Apply Weka's APRIORI (association rule mining) algorithm on a large dataset (more than 1,000 examples; can be a dataset used in assignment 2).
a. Try several different choices of the support and confidence, and analyze the results obtained.
b. Start with a high support, and gradually reduce it. Find out 20 association rules with the highest support, and a confidence of at least 75%.