Unsupervised Learning: Clearly Explained

from | 12 May 2023 | Basics

Unsupervised Machine Learning is a powerful tool for gaining valuable insights from data. Unlike supervised machine learning, it does not require labelled data, but aims to automatically discover patterns, structures or groupings in the data. Using techniques such as clustering, dimensionality reduction or association analysis, companies can uncover hidden information, gain new insights and make better decisions.

Unsupervised learning can help understand customer behaviour, detect fraud, identify product segments and much more. An understanding of Unsupervised Machine Learning is therefore important for companies to realise the full potential of their data and gain competitive advantage.

What is Unsupervised Machine Learning?

Unsupervised machine learning is a type of machine learning, where a Algorithm Patterns and structures in data detected without being provided with a target variable or human surveillance. In contrast to the Supervised Learningwhere the algorithm is trained to make a prediction or a Classification based on labelled data, the unsupervised learning process No labelled data needed. Instead, the algorithm searches for structures in the Databy highlighting the similarities between identifies different characteristics or instances and groups or clusters them together.

Typical applications of Unsupervised Learning are the segmentation of customers in marketing research, the detection of anomalies in the Cybersecurity or pattern recognition in image and text processing.

How does Unsupervised Learning work?

In Unsupervised Machine Learning, the algorithm must independently find correlations and patterns in the data and use these to structure or group the data or to gain new insights. For this there are generally Three types of unsupervised learning methods: Clustering, association and dimensionality reduction.

What is clustering?

When clustering, the algorithm groups data points based on similarities in their characteristics into clusters. The aim of clustering is to find patterns in the data and group them to identify the intrinsic structure of the data set. As an example, the buying behaviour of customers in a supermarket can be analysed to identify similar groups of shoppers who buy similar products.

Semantic mapping in machine learning
Examples of a semantic assignment according to a clustering procedure (top) and different forms of clusters (bottom).

Clustering can be used in various applications, such as market segmentation, identifying patterns in biological research or detecting anomalies in cyber security.

Clustering algorithms

k-Means algorithm

The k-means algorithm is one of the most commonly used clustering algorithms. It divides the data into k groups or clusters, where k is the number of clusters given. The algorithm starts with randomly selected k centres and assigns the nearest centre to each data point. The centres are then recalculated and the data points are reassigned. This process is repeated until the assignments are stable.

Hierarchical Clustering 

Hierarchical Clustering is the process of gradually grouping data points together to create a hierarchy of clusters. There are two types of Hierarchical Clustering: agglomerative and divisive. In agglomerative clustering, each data point starts as a separate cluster. These clusters are gradually combined to form larger clusters. In divisive clustering, on the other hand, the algorithm starts with a large cluster and divides it into smaller clusters.


DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering method in which the algorithm attempts to find clusters of densely populated regions. Data points in densely populated areas are considered part of the same cluster, while data points in sparsely populated areas are considered outliers or noise.

Mean Shift 

Mean Shift aims to identify potential cluster centres in a dataset by estimating the density of the data and iteratively shifting the bandwidths towards the highest density until a stable convergence point is reached. The data points within each bandwidth are then assigned to a cluster. Mean shift is particularly useful in identifying clusters in data sets with different shapes and sizes and where the density of data points is not homogeneously distributed.

What is association analysis?

Association in Unsupervised Learning relates to the Discovering common patterns and relationships between different attributes in a dataset. The focus is on which attributes or features of the dataset often occur together and which do not. The aim is to identify rules or associations that indicate which combinations of features or attributes occur most often.

A common example of association analysis is the analysis of buying behaviour in a supermarket. The objective is to find out which products are often bought together, for example to make recommendations for future customer purchases. By identifying patterns and associations, the company can also optimise the placement of products in the shop or target advertising for specific products.

Association algorithms

Apriori algorithm

The Apriori algorithm is mainly used in market research and e-commerce. It identifies frequently occurring combinations of attributes and can be modified in various ways to meet different association analysis requirements.

The algorithm works in two steps. In the first step, all individual items in the data set are analysed and the frequency of each item is determined. In the second step, combinations of items (itemsets) are analysed to determine the frequency of combinations. The algorithm uses a support threshold to identify frequent itemsets that are above the threshold. 


Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) is similar to the Apriori algorithm and uses a support threshold to identify frequent itemsets. However, unlike the Apriori algorithm, Eclat does not use candidate generation. Instead, it uses an equivalence class technique to determine the frequency of itemsets. Eclat is often used in market research and retail to identify correlations between different products or services.

FP-Growth algorithm

The FP-Growth algorithm is another frequently used algorithm for association analysis. Unlike the Apriori algorithm, however, it does not use Apriori principles or candidate generation. Instead, the algorithm creates a tree (FP tree) from the itemsets in the dataset. The tree is used to identify all frequent itemsets in the dataset. The FP-Growth algorithm is fast and efficient because it traverses the dataset only once and uses the FP-tree to identify frequent itemsets.

What is dimensionality reduction?

With dimensionality reduction, the number of Features in the data reduced while the most important information is retained. This is an important step in data analysis, especially when processing large data sets with a high number of features, referred to as "high dimensionality".

The goal of Dimensionality Reduction is to transform the dataset into a lower dimension while retaining important information. This can help simplify the dataset, reduce computation time, reduce memory requirements and improve the performance of machine learning algorithms.

As an example, dimensionality reduction in a supermarket customer dataset helps to reduce the variety of characteristics, such as age, gender, income and expenditure. This helps to identify the most important characteristics and best explain the variation in the data. With the principal components extracted, customers can be visualised in a two-dimensional space. Similar customers are located close to each other, which can indicate patterns and groupings. This makes it possible to identify customer segments and develop targeted marketing strategies or offers. 

Dimensionality reduction algorithms

There are various methods for dimensionality reduction that can be divided into two categories: the Feature selection and the Feature extraction. Feature selection consists of selecting the most important features or attributes from the dataset and discarding the remaining features. Feature extraction, on the other hand, aims to generate new features by performing a linear or non-linear transformation of the original feature space.

Principal Component Analysis (PCA)

Principal Component Analysis belongs to feature extraction and is often used to analyse and visualise data. The goal of PCA is to reduce the dimensionality of a data set by creating a new set of variables (called principal components) that explain the variation in the data as best as possible. This involves applying linear transformations to the data to find a new coordinate system basis in which the greatest variance is in the first principal component, followed by the second greatest variance in the second principal component, and so on. PCA can help remove redundant information in the data, reduce noisy variables and simplify the data structure.

Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis is an example of feature selection and is used for classifications. The goal of LDA is to find a linear combination of features that separates the classes while minimising the variation within the classes. In contrast to PCA, LDA aims to project the data into a low-dimensional space in which the classes are readily distinguishable. LDA takes into account the class membership of the data points and optimises the projection to separate the classes as well as possible. LDA can therefore be used for feature selection or to create new features to improve classification performance.

t-Distributed Stochastic Neighbor Embedding (t-SNE)

t-SNE can be used as both feature extraction and feature selection. It is a non-linear method for visualising high-dimensional data in a low-dimensional space. t-SNE is particularly well suited for capturing complex relationships and patterns in the data. To do this, it uses a probability distribution to calculate the similarity between the data points in the original dimensions and in the low-dimensional space. The similarities are modelled in such a way that points that are close to each other in the original dimensions are also close to each other in the low-dimensional representation. t-SNE can help identify clusters or groupings in the data and visualise complex data structures.

Recommender, Clustering, Regression, Text Analytics, Anomaly Detection etc..:
Machine learning can be used today for a wide range of problems and is faster and more accurate than ever. But what is it about the algorithms behind it?

Top 10 ML Algorithms for Beginners

Further examples of algorithms

Unsupervised learning algorithms can also be used for other tasks such as anomaly detection or data generation. In this case, the algorithm learns what is normal and can thus detect unusual or abnormal events or generate new data that is similar to the existing data.

In general, the result of unsupervised learning is less precise than that of supervised learning because the algorithm examines the data without prior knowledge of its structure. Nevertheless, it can be useful for gaining new insights into datasets or for identifying complex relationships in large datasets.

Anomaly detection

Anomaly detection algorithms attempt to identify unusual or abnormal data points in a data set. They are particularly suitable for Fraud detection, where 

fraudulent activities or transactions in a system are to be detected and prevented. The most common algorithms include: 

  • Isolation Forest
  • Local Outlier Factor (LOF)
  • One-Class SVM

Generative models

Generative models are used to generate new data that is similar to the underlying data set. Examples are:

Neural-Network-based Unsupervised Learning: 

Neural-Network-based Unsupervised Learning uses neural networks to automatically detect and extract features in a data set. Commonly used:

Natural Language Processing

Natural language processing algorithms are used specifically for processing text data and can, for example, identify topics in texts or calculate semantic similarities between words. Common algorithms are:

  • Latent Dirichlet Allocation (LDA)
  • Word2Vec
Natural Language Processing

The natural, spoken language of humans is the most direct and easiest way to communicate. Learn how machines and algorithms use NLP in innovative ways:

Natural Language Processing (NLP): Natural language for machines

Unsupervised vs. supervised learning

The The essential difference between supervised and unsupervised learning lies in the Training data. In supervised learning, a set of training data with labelled answers (e.g. classifications) is available, whereas in unsupervised learning no labelled answers are given and the system itself has to recognise patterns and correlations in the data.

The The objective of both models is also completely different. Supervised learning is intended to be a Model that is able to correctly classify or predict new, unlabelled data. In contrast, the goal of unsupervised learning is to recognise and understand hidden structures or patterns in the data.

Another important difference is that supervised learning tends to be used for narrowly defined and specialised tasks like Image recognitionThe use of unsupervised learning is more common for tasks such as clustering, anomaly detection and dimensionality reduction.

Despite these differences, there are also commonalities between supervised and unsupervised learning. Both techniques use machine learning algorithms to identify patterns and relationships in data and make predictions. Both techniques can be used in many different application areas to generate insights and benefits. Finally, both techniques can be used together in hybrid approaches, such as Semi-Supervised Learning, can be used to achieve even better results.

Supervised Learning: compactly explained, teacher in front of her class and the blackboard

For more information on supervised machine learning, read our basic article for beginners and experts:

Supervised Learning: Clearly Explained

What are the advantages of Unsupervised Learning?

  • No labelled data required: Unlike supervised learning, no labelled training data is required. This can be very useful when it is difficult or expensive to obtain labelled data.
  • Detection of hidden patternsUnsupervised Learning can detect hidden patterns and structures in data that are not obvious at first glance. This can help to gain new knowledge and insights that would otherwise not have been discovered.
  • Anomaly detectionUnsupervised Learning can detect anomalies or outliers in data that may indicate problems or deviations. This can be useful in many applications such as fraud detection, security or health monitoring.
  • FlexibilityUnsupervised Learning is flexible and can be applied in many different ways. This makes it a versatile tool for data analysis and machine learning.
  • Scalability: Unsupervised learning can be applied to large data sets and is usually scalable, meaning that it can be used for complex problems and large data sets.

Overall, Unsupervised Learning provides an effective way to analyse data and identify patterns that can help solve complex problems.

What are the disadvantages of unsupervised learning?

Although Unsupervised Learning offers many advantages, there are also some disadvantages that should be considered:

  • Difficulty in the evaluationSince there are no target variables in Unsupervised Learning, it is more difficult to assess how well the algorithm is working. There is no clear way to evaluate the model's predictions, making the results harder to interpret and validate.
  • Misinterpretation of results: It is possible that the model detects or interprets incorrect patterns that have no meaning. If these patterns are then used as the basis for decisions, they may be inaccurate or misleading.
  • OverfittingUnsupervised learning models can lead to Overfitting tend, especially if the number of features in the data is high. The model can then detect patterns that are only present in the training data and cannot be generalised.
  • Requires expert knowledgeUnsupervised learning models usually require a certain amount of expert knowledge to be configured and interpreted correctly.

It is important to take these disadvantages into account when using unsupervised learning techniques and to take appropriate measures to validate and interpret the results.

Conclusion: Why is Unsupervised Machine Learning used? 

Unsupervised Machine Learning is an important tool that enables companies to gain valuable insights from their data. Using techniques such as clustering, dimensionality reduction and association analysis, Unsupervised Learning can uncover hidden patterns, structures and relationships in the data. This makes it possible to understand customer behaviour, detect fraud, identify product segments and much more. Unsupervised learning has its advantages in terms of flexibility in data analysis and the ability to discover new insights. However, it is important to understand the limitations and challenges of Unsupervised Learning and to select the right algorithms and techniques for specific use cases. By combining Unsupervised Learning with other machine learning methods, companies can expand their knowledge and make informed decisions to increase their business success.



Pat has been responsible for Web Analysis & Web Publishing at Alexander Thamm GmbH since the end of 2021 and oversees a large part of our online presence. In doing so, he beats his way through every Google or Wordpress update and is happy to give the team tips on how to make your articles or own websites even more comprehensible for the reader as well as the search engines.

0 Kommentare

Submit a Comment