Classification Performance Evaluation

npos <- 500; nneg <- 500; set.seed(1) df <- rbind(data.frame(x=rnorm(npos,mupos), y=1),data.frame(x=rnorm(nneg,muneg),y=-1)); df$y <- as.factor(df$y) sep <- tune(svm,y~x,data=df,ranges=list(gamma = 2^(-1:1), cost = 2^(2:4))) df$ypred <- predict(sep$best.model) ggplot(df,aes(x=x,fill=y)) + geom_histogram(alpha=0.2,position="identity",bins=51) + geom_point(aes(y=ypred,colour=ypred)) + scale_color_discrete(drop=FALSE)

npos <- 50; nneg <- 950; set.seed(1) df <- rbind(data.frame(x=rnorm(npos,mupos), y=1),data.frame(x=rnorm(nneg,muneg),y=-1)); df$y <- as.factor(df$y) rsep <- tune(svm,y~x,data=df,ranges=list(gamma = 2^(-1:1), cost = 2^(2:4))) df$ypred <- predict(rsep$best.model); ggplot(df,aes(x=x,fill=y)) + geom_histogram(alpha=0.2,position="identity",bins=51) + geom_point(aes(y=ypred,colour=ypred)) + scale_color_discrete(drop=FALSE)

newneg <- df %>% filter(y == 1) %>% sample_n(900,replace=T); dfupsamp <- rbind(df,newneg) upsep <- tune(svm,y~x,data=dfupsamp,ranges=list(gamma = 2^(-1:1), cost = 2^(2:4))) dfupsamp$ypred <- predict(upsep$best.model); ggplot(dfupsamp,aes(x=x,fill=y)) + geom_histogram(alpha=0.2,position="identity",bins=51) + geom_point(aes(y=ypred,colour=ypred)) + scale_color_discrete(drop=FALSE)

	True class
	Total population	class positive	class negative
Predicted class	Predicted class positive	True positive	False positive (Type I error)
Predicted class negative	False negative (Type II error)	True negative

Precision = $Σ True positive Σ Predicted positive$	Recall = $Σ True positive Σ Class positive$

Sensitivity = $Σ True positive Σ Class positive$	Specificity = $Σ True negative Σ Class negative$

https://en.wikipedia.org/wiki/Evaluation_of_binary_classifiers

		True class
	Total population	class positive	class negative	Prevalence = $Σ class positive Σ Total population$
Predicted class	Predicted class positive	True positive	False positive (Type I error)	Positive predictive value (PPV), Precision = $Σ True positive Σ Predicted positive$	False discovery rate (FDR) = $Σ False positive Σ Predicted positive$
Predicted class	Predicted class negative	False negative (Type II error)	True negative	False omission rate (FOR) = $Σ False negative Σ Predicted negative$	Negative predictive value (NPV) = $Σ True negative Σ Predicted negative$
	Accuracy (ACC) = $Σ True positive + Σ True negative Σ Total population$	True positive rate (TPR), Sensitivity, Recall = $Σ True positive Σ Class positive$	False positive rate (FPR), Fall-out = $Σ False positive Σ Class negative$	Positive likelihood ratio (LR+) = $TPR FPR$	Diagnostic odds ratio (DOR) = $LR+ LR-$
		False negative rate (FNR), Miss rate = $Σ False negative Σ Class positive$	True negative rate (TNR), Specificity (SPC) = $Σ True negative Σ Class negative$	Negative likelihood ratio (LR−) = $FNR TNR$	Diagnostic odds ratio (DOR) = $LR+ LR-$

Review: Error Rate / Accuracy

Imbalanced classes

Example: 50% Positive, 50% Negative

Example: 5% Positive, 95% Negative

Example: Upsampling

Upsampling: Accuracy

Definitions:
True/False Positives/Negatives

Confusion Matrix

Precision and Recall, F-measure

F-measure Example

Sensitivity and Specificity,
Balanced Accuracy

Balanced accuracy Example

Many Measures

Cost sensitivity

Receiver operating characteristic (ROC)

Reading an ROC Curve

ROC example - Original: AUC = 0.475

ROC example - Upsampled: AUC = 0.799

AUROC, \(c\)-statistic

Big picture: Optimizing classifiers