This Title All WIREs
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 7.250

Active learning with support vector machines

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

In machine learning, active learning refers to algorithms that autonomously select the data points from which they will learn. There are many data mining applications in which large amounts of unlabeled data are readily available, but labels (e.g., human annotations or results coming from complex experiments) are costly to obtain. In such scenarios, an active learning algorithm aims at identifying data points that, if labeled and used for training, would most improve the learned model. Labels are then obtained only for the most promising data points. This speeds up learning and reduces labeling costs. Support vector machine (SVM) classifiers are particularly well‐suited for active learning due to their convenient mathematical properties. They perform linear classification, typically in a kernel‐induced feature space, which makes expressing the distance of a data point from the decision boundary straightforward. Furthermore, heuristics can efficiently help estimate how strongly learning from a data point influences the current model. This information can be used to actively select training samples. After a brief introduction to the active learning problem, we discuss different query strategies for selecting informative data points and review how these strategies give rise to different variants of active learning with SVMs. This article is categorized under: Technologies > Machine Learning
The three rectangles depict unlabeled samples while the blue circles and orange triangles represent positively and negatively labeled samples, respectively. Intuitively, the label of the sample xa might tell us the most about the underlying distribution of labeled samples, since in the feature space, ϕ(xa) is closer to the decision boundary than ϕ(xb) or ϕ(xc).
[ Normal View | Magnified View ]
Single version space for the multi‐class problem with N one‐vs‐all SVMs. The area corresponds to the version space area if the label y = i for the sample we picked. The area corresponds to the case where y ≠ i. In the multi‐class case, we want to measure both quantities to approximate the version space area.
[ Normal View | Magnified View ]
Binary classification with active learning on six samples and passive learning on the full dataset. (a) Uncertainty sampling. (b) Selecting representative samples. (c) Combining informativeness and representativeness. (d) Optimal hyperplane, obtained by training on the whole dataset.
[ Normal View | Magnified View ]
The white rectangles depict unlabeled samples. The blue circle and the orange triangle are labeled as positive and negative, respectively. In feature space, ϕ(xa) lies closer to the separating hyperplane than ϕ(xb), but is located in a region, which is not densely populated. Using pure uncertainty sampling, e.g., Simple Margin, we would query the label of sample xa.
[ Normal View | Magnified View ]
In this case, the MaxMin Margin strategy would query sample xc. Each of the two orange circles correspond to an SVM trained with a positive and a negative labeling of xc.
[ Normal View | Magnified View ]
The version space area is shown in white, the solid lines depict the hyperplanes induced by the support vectors, the center of the orange circle is the weight vector w of the current SVM. The dotted lines show the hyperplanes that are induced by unlabeled samples. This visualization is inspired by Tong. (a) Simple Margin will query the sample that induces a hyperplane lying closest to the SVM solution. In this case, it would query sample xa. (b) Here the SVM does not provide a good approximation of the version space area. Simple Margin would query sample xa while xc might have been a more suitable choice.
[ Normal View | Magnified View ]
Geometric representation of the version space in 3D following Tong. (a) The sphere depicts the hypothesis space. The two hyperplanes are induced by two labeled samples. The version space is the part of the sphere surface (in white) that is on one side of each hyperplane. The respective side is defined by its label. (b) The center (in black) of the orange sphere depicts the SVM solution within the version space. It has the maximum distance to the hyperplanes delimiting the version space. The normals of these hyperplanes, which are touched by the orange sphere, correspond to the support vectors.]
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Machine Learning

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts