You can start from an EZi-based
K-means project implementing an interface that allows to load images, enter seeds, play with parameters, switch between RGB and RGBXY features, and save your results as images. To make this into a working algorithm, just implement functions *Kmeans* and *init_means* in file *Kmeans.cpp*.
You can also use MATLAB or other software, as long as you write your own code for iterations of K-means algorithm. For MATLAB
I can give some example of code using seed-based interface, let me know.

- ("Theoretical" part) Assume one-dimentional (scalar) data points "X_i" and derive optimal value of scalar "m" that minimizes the sum of squared errors f(m) = sum_i (X_i-m)^2. HINT: use calculus to find a derivative of function "f(m)" with respect to its only variable "m" and find value of "m" where this derivative is zero. In fact, function "f" achieves its minimum at this value of "m" (you can also check that the second derivative of "f" is positive at this value of "m"). You should derive a general formula for "m" in terms of (known) values of "X_i". Present your derivations (max 3-6 lines) in the report. (3% of the mark).
- Implement K-means algorithm for clustering colors (RGB space, you can use other color spaces if you like) of image pixels.
- Use your K-means algorithm for clustering image pixels based on RGB or RGBXY features. The former idea is commonly used for
*color quantization*, the latter idea is used for*superpixels*(e.g SLIC superpixels, see the figure above). - Introduce parameter "w" that changes relative weight of XY component (x and y coordinates) in 5D feature [R,G,B,wX,wY] at each pixel. Make sure that for w=0 you get the same result as for RGB features. Compare the clustering results as you increase w.
- Play with larger values of parameter k (e.g. k=12) and investigate what happens as you slowly increase w from 0.0 to 1.0 and even larger values. For large K, you should observe transition from noisy (over-quantized) colors to smooth "superpixels".
- Evaluate sensitivity of K-means to local minima. Show different solutions for different random initial means and compute the corresponding values of the K-means energy.
- Use seeds (interactively marked pixels) as follows: estimate "mean" feature vector inside a seeded region of one color and use it as an initial "mean" for one of the cluster labels (instead of a random mean). Do this for seeds of all distinct colors entered by a user (not exceeding K, of course). Think of properties of this form of user-guided initialization and demonstrate them by in your results?
- (Grad students, extra credit for undergrads) Implement mean-shift (or medoid-shift) with box-kernel (window) and evaluate sensitivity to bandwidth. Demonstrate your results.
- (Extra credit for all students): implement automatic estimation of K using AIC or BIC energies discussed in class.