What is semi-supervised learning?

An algorithm is trained with both labelled and unlabelled data. Semi-supervised learning thus enables time- and cost-efficient learning. In the field of Artificial intelligence a learning process is needed that allows the system to intelligently learn relationships. Unlike supervised learning, semi-supervised learning is able to classify data quickly and effectively as in unsupervised learning.

A variety of scenarios are possible where data with labels is actually not readily available. For example, semi-supervised learning can achieve optimal results with a fraction of labelled data, such as hundreds of training examples. Semi-supervised learning can handle those types of datasets that choose either supervised learning or unsupervised learning - without having to compromise.

When is semi-supervised learning used?

semi-supervised or semi-supervised learning involves feature estimation of appropriately labelled and unlabelled data. With this approach, not as much labelled data is needed, which is often relatively expensive to create. Unlabelled data is much cheaper and this can also be used for learning. The challenge is in compiling this Training datato provide a ratio of labelled and unlabelled data of high overall significance to the algorithm.

The aim is to assign a correct label to the unlabelled data. This can be achieved with so-called label propagation. Such a method has similarities to a Cluster analysis on. The data can be divided into clusters and then within the cluster the unlabelled data can easily be assigned the same labels.

What is label spreading?

Label spreading is a form of semi-supervised learning algorithm. This algorithm by Dengyong Zhou et al. appeared in their article entitled "Learning with Local and with Global Consistency" in 2003. Thus, the intuition for a broader approach to semi-supervised learning is that nearby points in the input space should have the same label and the points in the same structure or manifold in the input space should have the same label.

Label propagation is practically borrowed from a technique in experimental psychology called a propagation activation network. Thus, points in the data set are connected quite simply based on the relative distances in the input space in such a graph. Symmetrically normalised is the weight matrix of this graph, similar to spectral clustering. The information is then passed through the graph, which is adjusted to capture the structure in the input space. Thus, finally, the label of each unlabelled point is set to the very class where it just got the most information during the iteration process. The use of label spreading helps to save costs.