Text mining is a real boon given the flood of texts and text data that is created every day. It would exceed any human capacity to read, keyword or sort terabytes of text data according to other criteria. In this article, we look at the basics of this method and show possible applications of text mining.
Application of Text Mining in Recent History - Analysis of the Panama Papers
An example from recent history illustrates the purpose of text mining: the Panama–Papers. The largest data leak in history consisted of 2.6 terabytes or 11.5 million files. Which persons or organisations are involved and in which context are they named? What would have been an extremely time-consuming undertaking for humans could be solved in a time-effective manner with the help of graph databases and the corresponding connections of the data leak could be found and presented.
Dark data - unused data holds great potential for companies
Unused data stored in large quantities in companies is also referred to as "dark data". The market research company Gartner estimates that 80 per cent of companies will not succeed by 2021 in establishing regulations for the secure and productive handling of this growing amount of Data to establish. Text data mining is one of the most promising approaches for extracting information from unused, unstructured data. meaningful and significant Information crystallise.
Definition: What is Text Mining?
Text mining is - similar to the term Data mining - a collective term. Accordingly, it covers a whole range of different Algorithm-based analysis methods. Text data mining can be understood as a special form of data mining in that, as in this case, it is a matter of creating a Added value to generate.
Also extracting Procedures that search for specific information from a large volume of text data (Big Data), can be understood as a special form of text mining. In addition, there are procedures that are similar to those used in big data analyses. Here, a large amount of text data is not searched for specific information, but for structural similarities or patterns. In this way, texts can be sorted according to criteria such as tone, theme, function or other features and conspicuousness. The Goals of text mining are:
- The Extraction of implicit knowledge from large amounts of text data
- Making visible Patterns and Relationships of information represented in texts
- The Evaluation of text data that cannot be read by humans due to the size alone
Reading tip: What makes a successful data science project? For many years we have been relying on the "Data compass".
What text mining methods are there?
In principle, text mining methods can be divided into two categories: linguistic methods on the one hand (Natural Language Processing) and statistical methods on the other hand. Since text data or texts form the basis of text data mining, linguistic methods also dominate the analysis methods. Data that are analysed using text mining methods fall into the data category of unstructured or semi-structured data. Linguistic language analyses, for example, for the Categorisation of data according to certain semantic characteristics, provide an important starting point for structuring this (semi-)structured data (Clustering).
Application examples for text mining
Text mining can be found everywhere in the working world where texts serve as a central basis for work. In areas such as the
- Tax advice,
- Insurance industry or
- in the legal environment
texts in the form of receipts, delivery notes, invoices, emails, contracts or legal texts play a central role. Text mining solutions can provide the solution for very specific Questions be. For example, the machine verification of documents for accuracy and consistency can be used in audits or as part of stocktaking.
However, text mining can also be part of a broader solution approach. For example virtual Assistants take over a series of research tasks - such as the AI bot ROSS. Among other things, it is able to search text archives for specific legal texts or precedents.
Description: Text mining can be part of an overall solution - as in this case of an intelligent assistant called "ROSS".
Further application possibilities are also emerging in connection with the increasing number of networked products (Internet of Things). These offer Automation options on or can Recommendations for action on displays. For one of our customers, we analysed cooking recipes to identify recurring operations that a kitchen appliance can recognise and execute automatically. To do this, the recipe text data was analysed according to the Preparation and the Definition of the search criteria with text data mining for repetitive Patterns searches. For complex questions like this, several text mining methods can be used:
- Classification procedure
- Segmentation procedure
- Dependency analyses
Language and texts form the basis of our communication
Text mining is such an important field, among other things, because texts are one of the most important foundations of our Communication and thus our Economy form at all. Furthermore, journalism, scientific fields of application such as opinion research and questions in the social and economic sciences benefit enormously from this.
Thanks to text data mining, texts and the information they contain can be analysed to an extent that could never be guaranteed with conventional means and skills alone. Accordingly, the strengths of text mining are particularly noticeable wherever the amount of text to be handled exceeds the limits of human performance.