Space of outputs \(\mathcal{Y}\) is finite. Often classes are given numbers starting from \(0\) or \(1\).

- Usually no notion of “similarity” between class labels in terms of loss. Remember our loss function \(\ell(h(\mathbf{x}),y)\):
- Regression: \(\ell(9,10)\) is better than \(\ell(1,10)\)
- Classification: \(\ell(9,10)\) and \(\ell(1,10)\) are equally bad.
- Or, have explicit losses for every combination of predicted class and actual class.