Classroom Topic

Support Vector Machines

This page gives a simple explanation of Support Vector Machines. The topic was originally prepared for an Imaging Physics course presentation. The goal was to explain what SVM is, what it is useful for, and how it can classify data through an intuitive penguin dataset example.

View Notebook in GitHub →View References →

Basic Idea

SVM is a classification method. Imagine two groups of data points. Many lines may separate them, but SVM searches for the line that separates them with the widest possible gap. That gap is called the margin. The closest points around this margin are called support vectors.

A simple way to imagine it is a road between two groups. SVM tries to make that road as wide as possible while still keeping the two groups on different sides. Logistic regression focuses on probability. SVM focuses on the boundary and the margin.

Explanation of SVM Concept

1. SVM as a supervised classification method+

Support Vector Machine is a supervised classification method. It learns from labelled examples and tries to separate data points into correct classes. The main idea is not to predict probabilities directly, but to find a strong decision boundary between classes.

Here, x represents the input features and y represents the class label. For a simple two class problem, labels can be written as negative and positive classes.

2. Many lines can split the data+

If the data has two features, we can draw it on a two dimensional graph. Several lines may separate the classes. However, SVM does not only ask whether a line separates the data. It asks which separating line is the best one.

This is the intuitive starting point. Many answers can work, but SVM searches for the most reliable separating boundary.

3. Margin, hyperplane, and support vectors+

The best boundary is the one with the widest margin. The margin is the distance around the separating boundary. The closest points to this boundary are called support vectors, and the main separating boundary is called the hyperplane.

Mathematical form

Decision boundary:
wᵀx + b = 0

Margin boundaries:
wᵀx + b = +1
wᵀx + b = −1

Margin width = 2 / ||w||

This is the mathematical version of the widest road idea. SVM tries to keep the closest points from both groups as far as possible from the separating boundary.

SVM margin hyperplane and support vectors

4. Penguin dataset demonstration+

The practical example used a penguin dataset with Adelie, Gentoo, and Chinstrap penguins. To make the first demonstration simple, Chinstrap was removed and the model focused on separating Adelie and Gentoo penguins.

The dataset contains physical measurements such as bill length, bill depth, body mass, island, and sex. This makes it useful for a visual classification example.

5. Feature selection with bill length and bill depth+

Bill length and bill depth were selected because they are easy to visualize in two dimensions. In this feature space, each penguin becomes one point on the graph.

The classifier does not understand penguins directly. It only sees numerical features. Good feature selection makes the separation clearer.

6. Linear SVM training+

A linear Support Vector Classifier was trained on the selected features. Linear means the classifier searches for a straight separating boundary between the two penguin groups.

Linear SVM is the simplest case and fits the visual explanation well because the boundary can be drawn as a line.

7. Prediction example+

After training, the model can classify a new penguin measurement. A point close to the decision boundary is useful because it tests whether the learned separation is meaningful.

Mathematical form

Decision rule:
f(u) = wᵀu + b

if f(u) ≥ 0  →  class +1
if f(u) < 0   →  class −1

A new point u is compared with the learned direction w. The sign of the result decides the predicted class.

8. C parameter and misclassification+

The C parameter controls how strict the classifier should be. A lower C allows more tolerance. A higher C punishes misclassification more strongly.

Mathematical form

Soft margin objective:
minimize  1/2 ||w||² + C Σᵢ ξᵢ

small C  →  softer margin
large C  →  stricter margin

This parameter is important because real data is rarely perfect. Sometimes allowing a few mistakes gives a better general boundary.

9. Soft margin condition+

Soft margin SVM allows some samples to violate the margin. This is useful when the dataset contains noise, outliers, or overlapping classes.

Mathematical form

yᵢ(wᵀxᵢ + b) ≥ 1 − ξᵢ
ξᵢ ≥ 0

The slack variable ξᵢ measures how much a sample violates the margin. This makes SVM more realistic for imperfect real world data.

10. Hard margin condition+

Hard margin SVM does not allow margin violations or misclassifications. It assumes that the data can be perfectly separated by a decision boundary.

Mathematical form

yᵢ(wᵀxᵢ + b) ≥ 1

There is no slack variable in hard margin SVM. Every training sample must stay on the correct side of the margin.

11. Higher dimensional classification+

SVM is not limited to two features. More features can be added, such as body mass. With three features, the separating boundary becomes a plane. With four or more features, visualization becomes difficult but the same idea still works.

The visual object changes from a line to a plane and then to a hyperplane, but the goal remains the same: separate the classes with a strong margin.

12. Multi class classification+

SVM is naturally easier to explain as a two class method. However, it can also classify more than two classes by comparing classes through smaller binary classification problems.

For the penguin dataset, this can mean Adelie versus the rest, Gentoo versus the rest, and Chinstrap versus the rest.

13. Kernel trick+

If the data cannot be separated with a straight line, SVM can use a kernel. The kernel trick changes the perspective of the feature space so that a separation becomes possible in another representation.

Mathematical form

Dual form:
L = Σᵢ αᵢ − 1/2 ΣᵢΣⱼ αᵢαⱼyᵢyⱼ(xᵢᵀxⱼ)

Kernel replacement:
xᵢᵀxⱼ  →  K(xᵢ, xⱼ)

Decision with kernel:
f(u) = Σᵢ αᵢyᵢK(xᵢ, u) + b

Common kernels:
Linear:      K(x, z) = xᵀz
Polynomial:  K(x, z) = (xᵀz + c)ᵈ
RBF:         K(x, z) = exp(−γ||x − z||²)

The kernel trick works because the dual formulation depends on dot products. Replacing the dot product with a kernel function allows SVM to handle non linear separation.

14. Application example in remote sensing+

The presentation also discussed remote sensing image classification. In that study, SVM was compared with other methods for classifying land information from image based features.

This shows that SVM is not limited to classroom examples. It can also be applied to image related classification tasks and remote sensing applications.

15. Short history+

The story of Support Vector Machines is strongly connected to Vladimir Vapnik and statistical learning theory. The basic idea did not appear suddenly as a finished modern tool. It developed over many years. First came the theoretical foundation. Later, stronger computers and real benchmark problems made the method practically testable. In the early 1990s, kernel based SVMs became especially important because they allowed non linear classification. This helped SVM become visible in problems such as handwriting recognition, image classification, and other pattern recognition tasks.

This is why SVM feels both mathematical and practical. It is a theoretical idea that became powerful when the computational environment finally caught up with it.

Strengths

Good option for small and medium sized datasets.
Works well with high dimensional feature spaces.
Strong margin based classification logic.
Kernel functions make it flexible for non linear data.

Limitations

Can become computationally expensive on large datasets.
Kernel choice affects the result strongly.
Hyperparameter tuning may require extra testing.
Results are harder to visualize in four or more dimensions.

References and Materials

These sources were used while preparing the SVM explanation, especially for the kernel trick, kernel types, remote sensing example, video material, and penguin dataset.

Kernel Trick

Reference used to explain how SVM can handle non linear data by changing the representation of the feature space.

https://www.cambridge.org/core/books/introduction-to-information-retrieval/support-vector-machines-and-machine-learning-on-documents/4D55209CD847EA7F0538E2C3F3757629

MathWorks Support Vector Machines

Technical overview of support vector machines, margins, hyperplanes, and common kernel types.

https://de.mathworks.com/discovery/support-vector-machine.html

Elmannai et al. 2013

Remote sensing image classification study discussed in the presentation as an application example of SVM.

https://www.researchgate.net/publication/261289195_Support_Vector_Machine_for_Remote_Sensing_image_classification

SVM Explanation Video

Additional video material used while preparing the classroom explanation.

https://www.youtube.com/watch?v=_PwhiWxHK8o&list=PL80ogUL8jd_l7GFtnW8XWayVO8F5csFGf&index=4

Penguins Dataset

Dataset used for the practical demonstration with Adelie, Gentoo, and Chinstrap penguins.

https://www.kaggle.com/datasets/larsen0966/penguins