Assignment 3, due November 23, 5% of class score

Instructions

Submit through OWL by the midnight of the due date
You may discuss the assignment with other students, but all code/report must be your own work
Assignment is to be done in Matlab
Deliverables: matlab code that you write yourself, and assignment write-up
Submit 2 files for your assignment: a pdf file with the report, and the code you wrote yourself (and only the code you wrote yourself) in one zipped file.
Indoor scene classification data for the assignment. There are ten different scene classes. Each scene class is in its own subdirectory. Each subdirectory contains 100 scenes (examples).
Use 80 samples for training 20 for testing.
Use the feature vector which is a combination of all feature vectors from assignment 1.
Useful matlab commands: predict, resubLoss, templateTree, fitensemble, kfoldLoss

Problem 1 (50%):
Use a bagged tree classifier, in matlab 'fitensemble' with options 'Bag', 'type','classification'. First use cross validation on the training data to select good values for the tree size, and the number of trees. Cross validation can be envoked with option 'kfold' in 'fitensemble'. I suggest using 5-fold cross validation. Trees of different sizes can be built with option 'MaxNumSplits' in 'templateTree' function. Use values 1,5,10, and 20 for 'MaxNumSplits'. For the number of trees, try values from 1 to 50. If you run cross-validation separately for numTrees = 1,2,...,50, it will take too long. Run 'fitensemble' with 'maxNumTrees' set to 50. You will get a cross-validated classifier 'ens'. Then use loss = kfoldLoss(ens,'mode','cumulative'). The 'cumulative' mode will tell you the loss (loss is just another name for error) for numTrees = 1,2,....,50. This saves a lot of time. On the same graph, plot number of trees vs. loss for each difreent tree type (i.e for 'MaxNumSplits' = 1,5,10,20) in different color. Discuss the plot.

Now retrain bagged classifier on all training data with maxNumTrees and MaxNumSplits giving the smallest error. Report and discuss the cross validation, training and the test errors.
Problem 2 (20%):
Repeat problem 1 now with adaboost, option 'AdaBoostM2' in 'fitensemble'. You should report all the same errors/plots as for Problem 1. It is interesting to plot cross validation errors on the same plot as that in problem 1, to compare boosting and bagging.Discus difference in performance from Problem 1.
Problem 3 (20%):
This problem explores the relationship beteen cross validation, training, and test errors. Choose 'MaxNumSplits' > 5, 'MaxNumTrees' > 100, and find cumulative cross-validation error (with folds > 4) with an ensemble classifier (either 'bag' or 'AdaBoost'). Then re-train classifier on all training data, and find cumulative training and test errors. Plot cross-validation, training, and test errors vs. number of trees on the same plot with different colros. Discuss your plot.
Problem 4 (20%):
Try to develop a bagged or boosted classifier that peforms better than those you developedin the previous problems. Report cross validation, training, and test errors. Explain what you did. Things you can try: add more features, use larger trees, use more trees, etc.