What is annotation?

Annotation in the field of artificial intelligence and machine learning includes the categorisation and labelling of data sets for processing by a machine or a neural network. There are numerous variants that result from the nature of the available data sets.

What is the annotation important for?

In order for a neural network or a Machine Learning Model In order to make valid decisions, it must first be trained with the help of large data sets. These data sets, in turn, must first be ensured that a) they contain valid information for the area to be trained and b) that they can be captured by machines at all.

For example, in order for a neural network to be used as a support or selector in the detection and differentiation of tumours in the human body, it must learn the detection from hundreds or thousands of real X-ray, MRI or CT images. To do this, however, the system must know in advance when it was correct in a classification and when it was not.

Therefore, those data sets must have been independently checked in advance. This is often where people come in, sorting the data by hand into different categories and tagging them with keywords.

What forms of annotation are there?


The most common is that of text. Several sub-categories can be identified here. For example, there is emotionally focused categorisation, which involves categorising text in terms of the attitudes, opinions and feelings it contains or indirectly communicates. For example, machine learning systems could be trained to filter profane language or content that is harmful to minors.

Furthermore, there are classification processes with regard to the intention of a text. This concerns the question of what goal a communicator is pursuing with his or her communication. In human-machine communication, the subtext, which is easy for humans to identify, comes to the fore, but is often difficult for computer systems to filter out, which is why the creation of training data with human help is usually indispensable.

Semantic annotation, on the other hand, describes the meaning of text content in more detail, so that neural networks learn to differentiate content better. One area of application would be computer-aided searches in online shops, so that customers receive suggestions with descriptions of articles whose exact names they do not know.

Audio data

In addition to text-based content, the categorisation of audio data also plays a role. Time stamps or transcriptions, intonation classifications or the identification of language, dialect and demographic features can thus lead to training data that would be inaccessible to machine learning without annotation.


Another area is the classification of images: the recognition of people, road signs or obstacles is not only indispensable for computer-aided driving assistants. Also in the Robotics face recognition or threat assessment is important. Image annotation helps to create useful training data sets by providing keywords, image descriptions and the like.


A kind of subcategory of image annotation is video annotation, which additionally involves a temporal aspect: keeping moving objects in view, recognising pixel areas as objects and categorising the objects, for which human-assisted data processing of video material is indispensable, sometimes proceeding frame-by-frame, marking areas of interest and tracking them over time.