SVM Remark

The original Vision for SVM: Maximize the distance of points to the separation hyper plane and put them in different sides. We can formulate the problem in the following form:
, where $d_\beta ( \mathbf{x} )$ is distance of feature $\mathbf{x}$ to the separation plane with paramter $\beta$.

But it is sometimes harder to solve the minmax style optimization problems. Hence we construct a proxy problem to find the solution for $\beta$.
, where $d_\beta ( \mathbf{x} )$ is computed based on $1/ \mid \beta \mid $. Thus, we can translate the problem into

It is sometimes inveitable that some error will occur. Thus we introduce the idea of soft margin and modify the loss function for this problem. We want to optimize the loss function:
when $C$ is huge, the tolerance for error classificaitonis low. Alternatively, we can also consider loss function with the following form:
Under this scenario, if $\lambda$ is huge, then the marin should be big and the tolerance for error classification is high.

For the loss function, we can consider a socalled hinge loss, which is
. It get the name because the graph of the function looks like a hinge. Finally, the output loss function is
. Because $y_i(\beta\mathbf{x}\alpha)$ should be larger than $1$, this equation will penalize points that wrongly calssified.