Assignment 3 of
Due date: Nov 26, 2012
Submission: Email your
assignment (Word, PDF, PS) to firstname.lastname@example.org
Individual effort (no
Total marks: 10% of
the final marks
1. Generate one synthetic 2D datasets (with at least 200 data points) that you think K-means clustering algorithm would work; generate another 2D datasets (with at least 200 data points) that you think DBScan
clustering algorithm would work. Use simpleKMeans and DBScan in Weka to
cluster each of the two datasets and visualize the results. Try
different parameters (and seeds) for the algorithms. Based on the
results and your original hypothesis, please point out whether the
results are consistent with your hypothesis, and try to analyze the
reason why the algorithms fail to work or works well.
To generate the synthetic datasets, you can use any tools or write a
program in whatever language you prefer. You can also use the "User
Define" option on the online DBScan Demo to generate the data manually.
submit the files containing the two synthetic datasets, visualization
of your clustering results in Weka, and the analysis of the results.
Weka's APRIORI (association rule mining) algorithm on a large dataset
(more than 1,000 examples; can be a dataset used in assignment 2).
- a. Try several different choices of the support and confidence, and analyze the results obtained.
Start with a high support, and gradually reduce it. Find out 20
association rules with the highest support, and a confidence of at