Today, data plays a crucial role everywhere and every day - in the daily life of every individual, in social and societal contexts, in individual companies and the economy as a whole. Yet we rarely ask ourselves what exactly data is and what its value is. This article therefore asks the simple question: What is data? As simple as the question may seem at first glance, the answers are as varied and complex.
What data means for everyone personally can be illustrated by a simple Thought experiment comprehend. Imagine you are playing a game where you are not playing for money, but for personal information. If you lose, you have to disclose the data you used. Browser history, chat histories, bank data or private pictures - what would you really be willing to stake in a game? What value would you have to be able to win in a game to make it worthwhile to use this data as a stake?
Definition: What is data?
Data are the (digital) Representation of real phenomena. Before the digital age, people always spoke of data when they were talking about numerical information or values obtained through measurements. In computer science, data is coded information. The digital Basic form from Data is a binary code, i.e. a more or less extensive collection of the numbers 0 and 1. Since data are encoded, there are certain standards according to which they are decoded. Such standards are, for example, file formats such as JPEG or PDF.
Similar to the way in which elements in chemistry can occur in three states of aggregation, data can occur in three Manifestations are present. While elements can assume different states according to the degree of their energy density, data is divided according to the degree of its Structuredness in:
- Structured Data
- Semi-structured Data
- Unstructured Data
Data can be distinguished from each other based on their structuredness
In the following we present the different structuredness of data.
Structured data is data that exists in a predefined, unique format. In a relational Database they have clear labels - therefore structured data can be found and edited very easily and very quickly.
Data is given a structure by being output in table form, for example. Wherever data is processed automatically, structured data brings great advantages. Contents such as measured values are structured, for example with additional information in a table, easy identifiable and can thus be fast Process. Search engines also make use of this advantage.
Semi-structured data are data that carry a "hidden" structure - this is why we speak of an implicit, irregular or partial structure. If different objects are combined in a software programme, semi-structured data sets result, for example. They occupy an intermediate position between structured and unstructured data, because although they have a certain degree of structure, their content is largely unknown. An example of semi-structured data is XML data.
Unstructured data are data that have no formal structure. Therefore, they cannot simply be stored in a relational database - such as an SQL database - like structured data. This is why unstructured data must first be prepared or structured before it can be analysed. The exact content of unstructured data is not known before a data analysis.
Unstructured data makes up a large part of all data generated in companies. Examples of unstructured data are Text datawhich are available in e-mails, customer reviews, forum posts etc., but also Image- and Video datawhich may arise in the course of production to ensure the quality of production.
The Data Lake and the Data Warehouse
Closely related to the aspect of Organisation and the different Formats The question of where data is actually stored is linked to the storage of data. Different concepts and technologies are available for this. The two most prominent representatives are the Data Warehouse and the Data Lake. Both also represent very different approaches to the data landscape. Companies that work a lot with Data analyses should always strive to avoid data silos. A Data Lake can help realise this goal.
Compared to a data warehouse and relational databases, in which data is prepared before it is stored, all the data that accumulates in a company flows into a data lake in its entirety. Raw form one. Accordingly, a data lake is an ideal form of storage for unstructured data.
A data catalogue creates order
In a company and in organisations, data is created in a wide variety of contexts and formats. In the past, this was usually done without an awareness of which Value in them and how they can still be used in the future. This makes it all the more important to bring data skills into one's own company and to develop a systematic approach to data (Data governance). One approach to a solution in this context is a Data catalogue. Data must be documented uniformly and systematically. This makes them much easier to find.
A Data catalogueSometimes also referred to as a data dictionary, this is a central information register for the entire spectrum of data. This directory brings together all the important Information on the existing data and data sources. In other words, a data catalogue is one of the most important tools to manage, review and locate data for further processing. A data catalogue ensures that all staff are aware of the existence, physical location, access rights and utilisation history, as well as Quality the data and Content of all Data sources are informed.
Link tip: We advise and support companies and their employees in the creation of a Data catalogue and conduct training on administration and use.
What value does data have?
An exciting question in this context is: What value does data have at all? So far, there is no clear, universally valid answer to this question. Even though there is always talk of data being the new oil, there is still no price that would roughly correspond to the price of crude oil. This is problematic in one respect in particular: Data are today an important Asset from Company. But as long as data has no price or it is not clear what it is really worth, it is difficult to measure how much financial effort is justified to secure, manage and evaluate data in a company.
A clue to putting a value on data is provided by the prices paid for data on the so-called "dark web". One Research by Intel and McAfee According to this, online banking data, for example, is traded at an average of $190. The price is calculated depending on the account balance. A data set that breaks down the shopping behaviour of customers can be had for considerably less, ranging from 3 to 20 dollars. These simple examples show that the Data value is placed in close relation to the expected profit that can be generated from its use.
How can added value be created from data?
The effort required to collect, store and analyse data must therefore be justified. In order for data to be value-added Component of a company, several aspects need to be considered:
- The Availability of the data must be guaranteed
- The Quality of the data must be good (Data quality)
- The Responsibilities in the company must be regulated (Data Roles)
- The data must legally compliant be (DSGVO)
- Data know-how Must be present
If these conditions are met for Data science projects In our experience, it is important to gain experience in the first use cases. In order for a company to become a long-term data-driven Company However, in order to ensure that the data is used in the right way, it is important not just to carry out one use case after the other, but to embed them in a comprehensive data strategy.