What is word embedding?

The word embedding establishes a connection between one or more entered words and the dictionary of a neural network. The term embedding contains the word embedding, which has its origin in mathematics. - Put simply, it is about a subset of a larger set. As a subset of the Natural Language Processing (NLP) and of the machine learning this form of mapping is used for natural language processing.

Typical use cases for NLP, and thus also word embedding, is the machine translation. - This means translators such as Google Translate or DeepL Translate. The extent to which algorithms influence the quality of the output or the translated text is decided by the procedures used in the algorithm.

What forms of word embedding are there?

The The simplest principle of word embedding is the Bag-of-words approach. In this method, a set of words is defined, with each word represented as a natural number and the order does not matter. Words can be repeated; repetitions are counted. Supplemented by a statistical classification procedure, it can be used, for example, to determine whether an email is spam or not by analysing the frequency of explicit words.

The Extension of the Bag-of-Words approach, Word2Vec represents each word as a multi-dimensional value. This representation makes it possible to visualise proximity to other words in a three-dimensional space. In this way, connections between words can be recognised and the artificial intelligence can be taught. This means that with the help of Word2Vec, for example, the best matching word (with the highest probability of being the missing word) is determined and a gap in a sentence is filled. The multi-dimensional vector representation can also be used to teach the neural network new words. To do this, the missing word in a sentence is not searched for; instead, two alternatives are offered to fill the gap. The neural network then learns the new words by means of so-called features, which is noticeable in the embedding in three-dimensional space.

Furthermore, the contextual word embedding an essential part of word embedding. The aim is to recognise and correctly represent the different meanings of homonyms. In practice, this goal is achieved by the Long short-term memory (LSTM) realised. This is a module that was originally intended to improve the development of artificial intelligence and did so. But even the Long Short-Term Memory (LSTM) quickly reached new limits. Neural networks with similar linguistic forms require so-called attention mechanisms. In order to be able to run these in parallel and consequently fast enough, in 2018 technology BERT was released, which was based on Transformer. BERT was trained using Next Sentence Prediction (NSP) and, thanks to this training, can now also learn contextual embeddings.

Which technologies are used to implement word embedding?

For the much discussed topic of machine translation and machine learning in general, there are Numerous libraries; most of them are based on the Python, the preferred programming languagewhen it comes to artificial intelligence. There are several reasons for this: Firstly, as a high-level language, Python is relatively easy to understand and has a low threshold; secondly, there are other useful libraries in addition to the machine learning libraries; and thirdly, Python is very flexible, which can be seen, for example, in the fact that the code can be executed on any platform. When it comes to concrete implementation, two libraries should be given particular attention: Keras Serves as a low-threshold APIthanks to which applications are made available in an uncomplicated manner; as the backend behind it, is offered by TensorFlow a library with a complex architecture. Keras is therefore merely a wrapper class for backend libraries such as Theano, PlaidML or MXNet.