What is annotation?

Annotation in the field of artificial intelligence and machine learning involves the categorization and labeling of data sets for processing by a machine or neural network. There are numerous variations based on the nature of the data sets available.

What is annotation important for?

In order for a neural network or machine learning model to make valid decisions, it must have been trained beforehand using large data sets. For these data sets, in turn, it must be ensured beforehand that they a) contain valid information for the area to be trained and b) can be machine-learned at all.

For example, in order for a neural network to be used as a support or selector in the detection and differentiation of tumors in the human body, it must learn to recognize them on the basis of hundreds or thousands of real X-ray, MRI or CT images. To do this, however, the system must know in advance when it was correct in a classification and when it was not. Therefore, those data sets must have been independently verified in advance.

This is often where humans come in, sorting the data by hand into different categories and annotating them with keywords.


The most common is that of text. Here, some subcategories can be identified. For example, there is emotion-focused categorization, which involves categorizing text in terms of contained or indirectly communicated attitudes, opinions, and feelings. For example, machine learning systems could be trained to filter profane language or content that is harmful to minors.

Furthermore, there are classification processes regarding the intention of a text. Here, it concerns the question of what goal a communicator is pursuing with his or her communication. In the case of human-machine communication, the subtext, which is easy for humans to identify, comes to the fore here. However, this subtext is often difficult for computer systems to filter out, which is why the creation of training data with human help is usually indispensable.

Semantic annotation, on the other hand, describes the meaning of textual content in more detail, so that neural networks learn to differentiate content better. One area of application would then be computer-aided search in online stores, so that customers receive suggestions based on descriptions of items whose exact names they do not know.

Audio Data

In addition to text-based content, the categorization of audio data also plays a role. Time stamps or transcriptions, intonation categorizations or the identification of language, dialect and demographic characteristics can thus lead to training data that would be inaccessible to machine learning without annotation.


Another area is the classification of images: not only for computer-aided driving assistants is the recognition of people, road signs or obstacles indispensable. Face recognition or hazard assessment is also important in robotics. Image annotation helps here by providing keywords, image descriptions and the like to useful training datasets.


A kind of subcategory of image annotation is video annotation, which additionally includes a temporal aspect: keeping moving objects in view, recognizing pixel areas as objects, and categorizing the objects, for which human-assisted data processing of video material is indispensable, sometimes proceeding frame-by-frame, marking areas of interest, and tracking them over time.

Data Navigator Newsletter


Data Navigator Newsletter