Back

Unsupervised Learning: Simply explained

Published: 12.05.2023
Category: Basics

Unsupervised Learning: kompakt erklärt, Kinder, die in einem Klassenzimmer durcheinander rennen

Unsupervised machine learning is a powerful tool for extracting valuable insights from data. Unlike supervised machine learning, it does not require labeled data, but instead aims to automatically discover patterns, structures or groupings in the data. Through techniques such as clustering, dimensionality reduction or association analysis, companies can uncover hidden information, gain new insights and make better decisions.

What is unsupervised machine learning?

Unsupervised machine learning is a type of machine learning in which an algorithm discovers patterns and structures in data without being provided with a target variable or human supervision. In contrast to supervised learning, in which the algorithm is trained to make a prediction or classification based on labeled data, no labeled data is needed in unsupervised learning. Instead, the algorithm searches for structures in the data by identifying the similarities between different features or instances and grouping them into clusters.

Typical applications of unsupervised learning include segmenting customers in marketing research, detecting anomalies in cybersecurity, and pattern recognition in image and text processing.

How does unsupervised learning work?

In unsupervised machine learning, the algorithm must independently find correlations and patterns in the data and use them to structure the data, group it, or gain new insights. There are generally three types of unsupervised learning methods for this: clustering, association, and dimensionality reduction.

What is clustering?

In clustering, the algorithm groups data points into clusters based on similarities in their features. The goal of clustering is to find patterns in the data and group them to reveal the intrinsic structure of the data set. As an example, the purchasing behavior of customers in a supermarket can be analyzed to identify similar groups of buyers who purchase similar products.

Examples of semantic mapping after a clustering procedure (top) and different forms of clusters (bottom).

Clustering can be used in various applications, such as market segmentation, identifying patterns in biological research, or detecting anomalies in cybersecurity.

Clustering algorithms

k-means algorithm

The k-means algorithm is one of the most commonly used clustering algorithms. It divides the data into k groups or clusters, where k is the number of predefined clusters. The algorithm starts with randomly selected k centers and assigns each data point to the nearest center. Then, the centers are recalculated and the data points are reassigned. This process is repeated until the assignments are stable.

Hierarchical Clustering

Hierarchical clustering gradually groups the data points to create a hierarchy of clusters. There are two types of hierarchical clustering: agglomerative and divisive. In agglomerative clustering, each data point starts out as its own cluster. These clusters are gradually combined to form larger clusters. In divisive clustering, on the other hand, the algorithm starts with a large cluster and divides it into smaller clusters.

DBSCAN

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering method in which the algorithm attempts to find clusters of densely populated regions. Data points in densely populated areas are considered to be part of the same cluster, while data points in sparsely populated areas are considered outliers or noise.

Mean Shift

Mean Shift aims to identify potential cluster centers in a data set by estimating the density of the data and iteratively shifting the bandwidths towards the highest density until a stable point of convergence is reached. The data points within each bandwidth are then assigned to a cluster. Mean Shift is particularly useful for identifying clusters in data sets with varying shapes and sizes and in which the density of data points is not homogeneously distributed.

What is association analysis?

Association in unsupervised learning refers to discovering common patterns and relationships between different attributes in a dataset. The focus here is on which attributes or characteristics of the dataset often occur together and which do not. The goal is to identify rules or associations that indicate which combinations of characteristics or attributes occur most frequently.

A common example of association analysis is the analysis of purchasing behavior in a supermarket. The goal is to find out which products are often purchased together, for example, to make recommendations for future customer purchases. By identifying patterns and associations, the company can also optimize the placement of products in the store or better target advertising for specific products.

Association algorithms

Apriori Algorithm

The Apriori algorithm is used primarily in market research and e-commerce. It identifies frequently occurring combinations of attributes and can be modified in various ways to meet different association analysis requirements.

The algorithm works in two steps. In the first step, all the individual items in the data set are analyzed and the frequency of each item is determined. In the second step, combinations of items (item sets) are analyzed to determine the frequency of combinations. The algorithm uses a support threshold to identify frequent item sets that exceed the threshold.

Eclat

Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) is similar to the Apriori algorithm and uses a support threshold to identify frequent item sets. However, unlike the Apriori algorithm, Eclat does not use candidate generation. Instead, it uses an equivalence class technique to determine the frequency of item sets. Eclat is often used in market research and retail to identify relationships between different products or services.

FP-Growth Algorithm

The FP-Growth algorithm is another frequently used algorithm for association analysis. However, unlike the Apriori algorithm, it does not use a priori principles or candidate generation. Instead, the algorithm builds a tree (FP-tree) of the item sets in the dataset. The tree is used to identify all frequent item sets in the dataset. The FP-growth algorithm is fast and efficient because it traverses the dataset only once and uses the FP-tree to identify frequent item sets.

What is dimensionality reduction?

Dimensionality reduction involves reducing the number of features in the data while retaining the most important information. This is an important step in data analysis, especially when processing large datasets with a high number of features, known as “high dimensionality”.

The goal of dimensionality reduction is to transform the dataset into a lower dimension while retaining important information. This can help to simplify the dataset, reduce computational time, reduce storage requirements, and improve the performance of machine learning algorithms.

As an example, dimensionality reduction in a supermarket customer data set helps to reduce the multitude of features, such as age, gender, income and spending. This allows the most important features to be identified and the variation in the data to be best explained. With the extracted principal components, the customers can be visualized in a two-dimensional space. Similar customers are close to each other, which can indicate patterns and groupings. This makes it possible to identify customer segments and develop targeted marketing strategies or offers.

Dimensionality reduction algorithms

There are various methods for dimensionality reduction, which can be divided into two categories: feature selection and feature extraction. Feature selection consists of choosing the most important features or attributes from the dataset and discarding the rest. Feature extraction, on the other hand, aims to generate new features by performing a linear or non-linear transformation of the original feature space.

Principal Component Analysis (PCA)

Principal Component Analysis is a feature extraction technique that is widely used for analyzing and visualizing data. The goal of PCA is to reduce the dimensionality of a data set by creating a new set of variables (called principal components) that explain the variation in the data as much as possible. It does this by applying linear transformations to the data to find a new coordinate system basis in which the greatest variance is in the first principal component, followed by the second greatest variance in the second principal component, and so on. PCA can help remove redundant information in the data, reduce noisy variables, and simplify the data structure.

Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis is an example of feature selection and is used for classifications. The goal of LDA is to find a linear combination of features that separates the classes while minimizing the variation within the classes. In contrast to PCA, LDA aims to project the data into a low-dimensional space in which the classes are easily distinguishable. LDA takes into account the class membership of the data points and optimizes the projection to separate the classes as well as possible. LDA can therefore be used for feature selection or to create new features that are intended to improve classification performance.

t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE can be used for both feature extraction and feature selection. It is a nonlinear method for visualizing high-dimensional data in a low-dimensional space. t-SNE is particularly good at capturing complex relationships and patterns in the data. It uses a probability distribution to calculate the similarity between data points in the original dimensions and in the low-dimensional space. The similarities are modeled in such a way that points that are close to each other in the original dimensions are also close to each other in the low-dimensional representation. t-SNE can help to identify clusters or groupings in the data and to visualize complex data structures.

Recommender, Clustering, Regression, Text Analytics, Anomaly Detection etc.:

Machine learning can be used for a wide range of problems and is faster and more accurate than ever. But what about the algorithms behind it?

Top 10 ML algorithms for beginners

More examples of algorithms

Unsupervised learning algorithms can also be used for other tasks such as anomaly detection or data generation. In this case, the algorithm learns what is normal and can thus recognize unusual or abnormal events or generate new data that is similar to the existing data.

In general, the result of unsupervised learning is less precise than that of supervised learning because the algorithm examines the data without prior knowledge of its structure. Nevertheless, it can be useful for gaining new insights into data sets or for recognizing complex relationships in large amounts of data.

Anomaly detection

Anomaly detection algorithms attempt to identify unusual or abnormal data points in a dataset. They are particularly useful for fraud detection, which

aims to identify and prevent fraudulent activities or transactions in a system. Some of the most common algorithms include:

Isolation Forest
Local Outlier Factor (LOF)
One-Class SVM

Generative models

Generative models are used to generate new data that is similar to the underlying dataset. Examples include:

Generative Adversarial Networks (GANs)
Variational Autoencoders (VAEs)

Neural-Network-based Unsupervised Learning:

Neural-Network-based Unsupervised Learning uses neural networks to automatically recognize and extract features in a dataset. Frequently used are:

Self-Organizing Maps (SOMs)
Restricted Boltzmann Machines (RBMs)

Natural Language Processing

Natural language processing algorithms are used specifically for processing text data and can, for example, identify topics in texts or calculate semantic similarities between words. Common algorithms are:

Latent Dirichlet Allocation (LDA)
Word2Vec

Natural spoken human language is the most direct and simple way to communicate. Learn how machines and algorithms use NLP innovatively:

Natural Language Processing (NLP): Natural language for machines

Unsupervised vs. supervised learning

The main difference between supervised and unsupervised learning lies in the training data. In supervised learning, a set of training data with labeled answers (e.g. classifications) is available, while in unsupervised learning, no labeled answers are provided and the system itself must recognize patterns and relationships in the data.

The objectives of both models are also completely different. In supervised learning, the aim is to train a model that is able to correctly classify or predict new, unlabeled data. In contrast, the goal of unsupervised learning is to recognize and understand hidden structures or patterns in the data.

Another important distinction is that supervised learning tends to be used for narrow and specialized tasks such as image recognition, language processing, and predicting customer preferences, while unsupervised learning is more commonly used for tasks such as clustering, anomaly detection, and dimensionality reduction.

Despite these differences, there are also similarities between supervised and unsupervised learning. Both techniques use machine learning algorithms to recognize patterns and correlations in data and make predictions. Both techniques can be used in many different application areas to generate insights and benefits. Finally, both techniques can be used together in hybrid approaches, such as semi-supervised learning, to achieve even better results.

For more information on supervised machine learning, check out our basic introduction for beginners and experts:

Supervised Learning: A Compact Guide

What are the advantages of unsupervised learning?

No labeled data required: In contrast to supervised learning, no labeled training data is required. This can be very useful when it is difficult or expensive to obtain labeled data.
Discovery of hidden patterns: Unsupervised learning can discover hidden patterns and structures in data that are not obvious at first glance. This can help to uncover new insights and discoveries that would not have been discovered otherwise.
Detecting anomalies: Unsupervised learning can detect anomalies or outliers in data that may indicate problems or deviations. This can be useful in many applications such as fraud detection, security or health monitoring.
Flexibility: Unsupervised learning is flexible and can be applied in many different ways. This makes it a versatile tool for data analysis and machine learning.
Scalability: Unsupervised learning can be applied to large data sets and is usually scalable, which means that it can also be used for complex problems and large amounts of data.

Overall, unsupervised learning offers an effective way to analyze data and detect patterns that can help solve complex problems.

What are the disadvantages of unsupervised learning?

Although unsupervised learning offers many advantages, there are also some disadvantages that should be considered:

Difficulty in evaluation: Since there are no target variables in unsupervised learning, it is more difficult to assess how well the algorithm works. There is no clear way to evaluate the model's predictions, making the results more difficult to interpret and validate.
Misinterpretation of results: It is possible that the model detects or interprets false patterns that have no meaning. If these patterns are then used as a basis for decisions, they may be inaccurate or misleading.
Overfitting: Unsupervised learning models can tend to overfit, especially when the number of features in the data is high. The model can then detect patterns that are only present in the training data and are not generalizable.
Requires expert knowledge: Unsupervised learning models usually require a certain amount of expert knowledge to be configured and interpreted correctly.

It is important to keep these drawbacks in mind when applying unsupervised learning techniques and to take appropriate measures to validate and interpret the results.

Conclusion: Why use unsupervised machine learning?

Unsupervised machine learning is an important tool that enables companies to gain valuable insights from their data. Through techniques such as clustering, dimensionality reduction and association analysis, unsupervised learning can reveal hidden patterns, structures and relationships in the data. This makes it possible to understand customer behavior, detect fraud, identify product segments and much more. Unsupervised learning has its advantages in terms of flexibility in data analysis and the possibility of gaining new insights. However, it is important to understand the limitations and challenges of unsupervised learning and to select the right algorithms and techniques for specific use cases. By combining unsupervised learning with other machine learning methods, companies can expand their knowledge and make informed decisions to increase their business success.

Share this post:

Provider:	HubSpot European Headquarters 1 Sir John Rogerson's Quay Dublin 2, Ireland
Cookiename:	__hstc; hubspotutk; __hssc; __hssrc; __cf_bm; __cfruid
Runtime:	6 months; 6 months; 30 minutes; session end; 30 minutes; session end
Privacy source url:	https://legal.hubspot.com/privacy-policy
Host:	.hubspot.com

Provider:	InnoCraft Ltd., 150 Willis St, 6011 Wellington, New Zealand
Cookiename:	_pk_id..; _pk_ses..
Runtime:	13 months; 30 minutes
Privacy source url:	https://matomo.org/gdpr-analytics/
Host:	.matomo.cloud

Provider:	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Cookiename:	YSC; VISITOR_INFO1_LIVE; PREF
Runtime:	Session end; 6 months; 8 months
Privacy source url:	https://policies.google.com/privacy
Host:	.youtube.com

Provider:	Podigee GmbH, Revaler Straße 28, 10245 Berlin, Germany
Cookiename:	Not specified
Runtime:	Not specified
Privacy source url:	https://www.podigee.com/en/about-us/privacy/
Host:	.podigee.com

Provider:	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Cookiename:	SID; HSID; NID
Runtime:	2 years; 2 years; 6 months
Privacy source url:	https://policies.google.com/privacy
Host:	.google.com