By linear models, we mean that the hypothesis function \(h_{\bf w}({\bf x})\) is a (transformed) linear function of the parameters \({\bf w}\).
Predictions are a (transformed) linear combination of feature values
\[h_{\bf w}({\mathbf{x}}) = g\left(\sum_{k=0}^{p} w_k \phi_k({\mathbf{x}})\right) = g({{\boldsymbol{\phi}}}({\mathbf{x}})^{\mathsf{T}}{{\mathbf{w}}})\]
where \(\phi_k\) are called basis functions As usual, we let \(\phi_0({\mathbf{x}})=1, \forall {\mathbf{x}}\), so that we don't force \(h_{\bf w}(0) = 0\)
Polynomial regression: set \(\phi_0(x) = 1, \phi_1(x) = x, \phi_2(x) = x^2, ..., \phi_d(x) = x^d\) and set \(g(x) = x\).
Basis functions are fixed for training (but can be chosen through model selection)