What is BERT?

BERT stands for "Bidirectional Encoder Representations from Transformers" and describes an algorithm that Google uses for search queries. In their so-called core updates, Google continues to develop the algorithm for search queries in order to achieve ever better search results for users' search queries.

BERT was introduced at the end of 2019 and has the purpose of better understanding the context of the search query. Special attention was paid to prepositions and filler words in the search query, which Google often ignored in search queries in the past. In addition to the use of the algorithm, BERT also introduced so-called "featured snippets". These are highlighted search results that are intended to provide the user with a brief answer to the search query.

Since BERT is based on speech and text recognition (Natural Language Understanding) as well as their processing, the algorithm is based on Natural Language Processing (NLP) in the area of neural networks. NLP has become the The aim is to make natural human language processable by computers.so that they understand the meaning of the language.

BERT uses a special field in the area of machine learningThis is known as transfer learning. In principle, machine learning concepts are based on the fact that training and test data originate from the same feature space and the same distribution. However, this has the limitation that if the distribution is changed, the original data will be lost. Training data cannot be used any further. In transfer learning, however, it is possible that training data from a "non-subject" data set can be drawn upon and used to find solutions. This reduces the number of training data required and, if necessary, also the training time. While transfer learning has its origins in image recognition, BERT uses this methodology for text processing, since search queries are very individual and specific training data is not always available.

How is the language model structured and what functions does it include?

The BERT language model is based on calculation models, so-called transformers, which place a word in relation to all other words in a sentence. and thus tries to better understand the meaning. The transformers function in such a way that input signals are converted via so-called encoders into a processable form of vectors with which mathematical operations can be carried out. In the so-called "self-attention layer", each word of the input is weighted according to a value scale. This value scale evaluates each word in relation to the other words in the input. The values are then normalised and weighted using the so-called softmax function in such a way that the sum of all values adds up to 1. They are then passed on to the next layer.

Both the encoders and the decoders are designed as Feed-Forward-Neural-Network constructed. This means that there is no feedback to previous layers within the neural networks, as is the case with recurrent networks. In the decoder, a self-attention layer is applied, the values are normalised and the processed input data are merged in the so-called encoder-decoder-attention layer. Afterwards, a neural feed-forward network is implemented and a linearisation of the values and the softmax function are applied in order to finally output the most probable solution.

BERT also works like most algorithms on the basis of probabilitieswhich is used as a basis for finding a solution.