Obtaining Your Own Copy of Weka

You can obtain WEKA by visiting the WEKA Project Webpage and clicking on the appropriate link for your operating system.


Using Weka

These instructions describe how to apply different learning algorithms to the hw2-1 data set. The other sets can be processed in exactly the same way, of course. When you start up Weka, you will first see the WEKA GUI Chooser. You should click on the Explorer button. This opens a large panel with several tabs, and the Preprocess tab will already be selected.

Click on "Open file...", navigate to the the "hw2_data" folder, and then select the "hw2-1-train10.arff" file and click " ok " button. The "Current relation" part of Explorer window should now show "Relation" as vote with 10 instances and 17 attributes. All the Attributes are listed on the left-hand side of the window. Now highlight the No. 17 Attribute (Class), then the table and bar plot on the right-hand side of the window will show 6 examples in class 1 and 4 in class 2.

Now click on the "Classify" tab of the Explorer window and first we need to select the learning algorithm to apply. Go to the "Classifier" panel (near the top) which initially shows "ZeroR" beside the "Choose" button. (ZeroR is a very simple rule-learning algorithm, which we do not want.) Clicking on "Choose" lets you choose a different algorithm.

Click on "Choose", and you will see a hierarchical display whose top level is "weka", whose second level is "classifiers", and whose third level contains seven general kinds of classifiers: "bayes", "functions", "lazy", "meta", "misc", "trees", "rules". To choose DecisionStump, click on the "trees" indicator and then select "DecisionStump". In your assignment, you will also need to choose "J48" under "trees" for the pruned and unpruned trees, "SMO" under "functions", and "NaiveBayesSimple" under "bayes" for the five learning algorithms.

You can use the default settings for each of the classifiers except for the unpruned decision trees: To build an unpruned tree, select "J48" using the Choose button, and then click on the text to the right of the button. (That is, click on the text that says "J48 -C 0.25 -M 2".) This will bring up a window of options for the J48 decision tree builder. One of these is "unpruned" which you change to "True" to build an unpruned tree. The text next to the Choose button should now read: "J48 -U -M 2".


Important change:

When using the hw2-2-train25.arff file, NaiveBayesSimple will not work, because all of the values of the ELONGATEDNESS feature are identical for the 'opel' class. For this file *only*, use the "NaiveBayes" classifier instead of the "NaiveBayesSimple" classifier. It will assume the values actually have some small positive variance.


Now that we have chosen an algorithm, we need to examine the "Test options" panel. First we will load in the test data. Click on the radio button "Supplied test set". Then click on the "Set..." button. A small "Test Instances" pop-up window should appear. Click on "Open file...", navigate to the "hw2_data" folder, and select "hw2-1-test100.arff". The Test Instances window should now show the relation vote with 100 instances and 17 attributes. You may close this window at this point.

Then we will tell Weka which of the 17 attributes is the class variable. Below the Test options panel is a drop-down menu with the 17 attributes. Choose the last one "(Nom) class".
[Num means numeric; Nom means nominal, i.e., discrete]

Now we are ready to run the algorithm. Click on the "Start" button, and the Classifier Output window will show the output from the classifier. DecisionStump, this output consists of several sections, most of which you don't need:



After learning a classifier, if you want to obtain the labels for the elements in a test set (as you will need to for hw2-br), ...

This then outputs a bunch of predictions with the following form:
=== Predictions on test set ===

 inst#,    actual, predicted, error, probability distribution
     1          ? 2:tested_p      +   0.124 *0.876
     2          ? 1:tested_n      +  *0.845  0.155
...
    14          ? 1:tested_n      +  *0.95   0.05
    15          ? 2:tested_p      +   0.256 *0.744
(The "+" indicates an error between actual classification and predicted classification. Since we're supplying ?, there is always an error.) The "*" flags the label with the highest probability --- ie, the value that would be returned here. After you have done this, you can either right-click (or middle click) on the Result list to save these labels to a text file which you can then hand in by e-mail.