Data mining - methods and examples from practice

from | 15 June 2020 | Basics

Data mining is one of the basic terms in the context of digitalisation and Data Science. It appears particularly in the context of Big Data projects and data analytics methods. The term generally refers to the systematic, mathematical-statistical handling of data. The goal is always to find patterns, relationships and correlations in large amounts of data. This article gives an overview of the underlying theory and illustrates the topic with 3 practical examples. However, data mining is not a universally applicable tool - rather, it is a group of Algorithmswhich promise very effective solutions in certain cases.

Inhaltsverzeichnis

What is data mining?

The term is used in the environment of Big Data data mining. Data mining subsumes the explorative methods in which - partly fully automated and partly only semi-automated - insights are gained from large amounts of data. The goal is to, Dependencies, Laws and Sample in otherwise disjointed or unstructured raw data. In accordance with the English term "mining", a metaphor from mining, the term "prospecting" is sometimes also used in this context. Data mining methods are statistical procedures that allow the Data to be analysed according to certain criteria. These can be roughly divided into four categories:
• Segmentation or clustering
• Association
• Classification
• Prediction
Depending on the use case, these methods can or must also be combined with each other. Data mining thus subsumes a whole range of methods that allow data to be handled sensibly and profitably. Large amounts of data are generated in industry, especially in the context of Monitoring or within the framework of the networked production.

Types of data mining

Data mining is the generic term for the systematic attempt to identify correlations, patterns and trends in data sets. Data mining uses a range of computer-assisted methods that work with statistical algorithms. Data mining is becoming increasingly important, especially due to the ever-growing amounts of data (big data).

Segmentation

Segmentation or clustering is a method in which objects with similar common characteristics are grouped together. The objects within the resulting group are therefore homogeneous.

Association

Association stands for the discovery of dependencies. Association includes association analysis and sequence analysis. Association analyses support users in deriving certain rules from data sets without having to specify a target variable. One area of application is shopping basket analysis. With the help of association, the purchase of an item B can be derived from the purchase of an item A. Sequence analyses extend association analyses by certain rules or statistics.

Classification

In classification, individual data objects are assigned to specific classes. The class must be defined in advance and objects are placed in this class based on characteristics that are also defined in advance. The basis is formed by data sets with various independent characteristics and a dependent target variable.

Prediction

In data mining, prediction is a forecast of previously unknown features based on previously gained knowledge. The basis is a training data set. This can be used to train models that make predictions about the development of certain dependent variables.

Data mining specialisations

The majority of all data mining approaches can be applied universally to different types of data. In addition, there are specialisations in data mining that are used for specific data.

Textmining

The Textmining is a data mining method specifically applied to the indexing of text datasets. Text data pose a special challenge because they are not trivial. Due to their multi- to high-dimensional and unstructured character, text data first require special preparation for further processing. In this process, the text data must be reduced by some dimensional characteristics and structured. Complex statistical and data linguistic procedures can be used to extract information and patterns from text documents. Natural language sources are also the subject of text mining. A typical application is computer-assisted methods for detecting textual plagiarism.

Webmining

Web mining is used to tap into various internet data. The object of data analysis is not only the actual web pages but also the relations between the pages (for example in the form of hyperlinks). The data analysis of web mining identifies both clusters and outliers among the web data. Web data sets are in a constant state of dynamism, which poses a particular challenge in web mining.

Time series analysis

Time series analysis is one of the data mining specialisations whose goal is a forecast. Future time series are to be determined in order to be able to derive predictions about future trends in this way, for example.

In the course of these data evaluations, new business fields and models can be created or developed. In the automotive sector, for example, fleet analyses can be carried out that make it possible to offer customers a completely new service model (aftersales). If conspicuous patterns in the data indicate the possible defect of a component, it can be replaced even before it causes damage (Predictive Mainentance). Further characteristic tasks of the Data mining are:
• Outlier detection: Identification of unusual data sets: Outliers, errors, changes
• Cluster analysis: Grouping of objects based on similarities
• Classification: unassigned elements are assigned to existing classes
• Association analysis: Identification of correlations and dependencies in the data in the form of rules such as "From A and B usually follows C".
• Regression analysis: Identification of relationships between (several) dependent and independent variables
• SummaryReduction of the data set into a more compact description without significant loss of information.

This is what it looks like in practice

We have already used data mining in numerous customer projects. By way of example, three use cases are presented here, which typical Operational scenarios for data mining. 1. reduction of repair times For one of our customers from the automotive industry, it was about the Reduction of repair times. The solution was to process suitable warranty data with the aid of a Association analysis identify conspicuous combinations of work steps that were associated with unwanted free working time. This enabled us to identify potential for optimisation in the workshop process. 2. fault detection in painting robots In another use case, which is also in the automotive sector, it was again a question of developing and implementing the Error detection for painting robots to improve. The goal was to develop an early detection system to completely avoid costly rework. Based on the analysis of the log data, we developed error patterns, which were subsequently Classification procedure be recognised. 3. customer lifetime value The third example comes from the banking sector. A German bank approached us with the desire to Customer Lifetime Value of their customers better. Instead of only taking a certain monetary value as a basis, in future customer activities should also be evaluated. After creating a suitable data basis by merging various data sources, we were able to identify customer types and evaluate them with the help of the Clustering procedure into five categories. These three use cases for data mining methods illustrate one thing above all. The concrete question is at the centre of Data science projectswhere data mining is used as a solution approach. If there is both an appropriate challenge and a suitable data basis (Big Data), data mining can be an effective tool for promoting profitable insights.

Data mining problems and limitations

With thoughtful application of the diverse analysis and evaluation techniques of data mining, these methods offer valuable insights and competitive advantages. All of these methods come with specific challenges. One of the most important data mining problems is that each methodology must first be defined manually. It is up to humans to define the dependent and independent variables, classes and the analysis techniques to be used. This means that the results of data mining are fundamentally distorted by certain presuppositions, ideas and goals. For this reason, companies often commission external data & AI specialists such as the Alexander Thamm GmbH with the tasks of data mining.

Michaela Tiedemann

Michaela Tiedemann has been part of the Alexander Thamm GmbH team since the early start-up days. She has actively shaped the development from a fast-moving, spontaneous start-up to a successful company. With the founding of her own family, a whole new chapter began for Michaela Tiedemann at the same time. Hanging up her job, however, was out of the question for the new mother. Instead, she developed a strategy to reconcile her job as Chief Marketing Officer with her role as a mother.