Bagging

What is bagging?

Bagging is an abbreviation of the term "bootstrap aggregating" and represents a procedure for variance reduction when using different classification and regression trees in the context of the machine learning dar.

Besides this increase in the accuracy of Classification- and Regression problemsbagging is also used to solve the well-known problem of the Overfitting to solve. The results of the algorithm are particularly good when the individual learners of the classification and regression trees are unstable and have a high variance.

According to the word components, this method involves bootstrap aggregating in two process steps. Bootstrapping basically describes a procedure in statistics in which random samples are repeatedly drawn from a defined data set in order to identify an unknown distribution function of the data set. Thus, this bootstrapping procedure can be classified as resampling, since sub-samples are repeatedly drawn on the basis of a sample (data set). These individual samples are then trained with the prediction model or weak classifiers and then aggregated to a predicted value.

This is where the name bootstrap aggregating comes from, as data is initially drawn through repeated sampling (using the bootstrapping procedure) and then the prediction models are unified (aggregated). Thus, it is possible that this methodology leads to an information fusion and thus increases the classification or regression performance.

How does the ensemble method work?

An ensemble method or ensemble learning is basically when several (weak) learners or classifiers are connected together and run through, thus creating a so-called ensemble. In this respect, ensemble methods are also referred to as a meta-approach to machine learning, since several models are combined to form a prediction value.

As described at the beginning When bagging (bootstrap aggregating), multiple samples of a data set are taken and the same algorithm is then trained and tested in parallel with the sample data.. This usually involves drawing random samples of the data set, but it would also be possible to distribute the entire data set and generate the distribution of the data from this. When the data is selected by random sampling, it corresponds to the "draw with reclamation" model. This means that certain data points can be included in the model several times (via multiple random selection), while others cannot be included at all.

After generating the sample, the learning algorithm is applied to each ensemble member. This is done in parallel with each other. Finally, the individual predictive models are aggregated, resulting in a final ensemble classifier. The individual models or algorithms can either flow into the classifier with equal weights or have different weights.

What is the difference between bagging and boosting?

In addition to bagging, the so-called Boosting an ensemble method in machine learning dar.

Thereby In contrast to bagging, the (weak) classifiers are not run through in parallel, but sequentially.. In both methods presented, a basic sample is drawn at the beginning. Due to the iterative and sequential approach of the ensemble method, it is possible that the findings from the previous steps are applied to subsequent steps. This is achieved by weighting incorrectly classified iterations differently from correctly classified iterations.

The aim of boosting is to create a strong classifier from a large number of weak classifiers. While weights can also be used in principle in bagging, they differ in boosting in that their size depends on the previous sequential progress, whereas the weights in bagging are already defined in advance, as the process runs in parallel.

Another difference between the two methods is the objective. The aim of bagging is to reduce the variance of the individual classifiers by combining them, while boosting aims to reduce the systematic error or bias of the distribution. In this sense, bagging can help solve the overfitting problem, whereas boosting does not.

Both methods can be combined with Python implement, whereby the scikit-learn library provides an implementation for ensemble methods and can thus be implemented relatively easily.

BERT

What is BERT?

BERT stands for "Bidirectional Encoder Representations from Transformers" and describes an algorithm that Google uses for search queries. In their so-called core updates, Google continues to develop the algorithm for search queries in order to achieve ever better search results for users' search queries.

BERT was introduced at the end of 2019 and has the purpose of better understanding the context of the search query. Special attention was paid to prepositions and filler words in the search query, which Google often ignored in search queries in the past. In addition to the use of the algorithm, BERT also introduced so-called "featured snippets". These are highlighted search results that are intended to provide the user with a brief answer to the search query.

Since BERT is based on speech and text recognition (Natural Language Understanding) as well as their processing, the algorithm is based on Natural Language Processing (NLP) in the area of neural networks. NLP has become the The aim is to make natural human language processable by computers.so that they understand the meaning of the language.

BERT uses a special field in the area of machine learningThis is known as transfer learning. In principle, machine learning concepts are based on the fact that training and test data originate from the same feature space and the same distribution. However, this has the limitation that if the distribution is changed, the original data will be lost. Training data cannot be used any further. In transfer learning, however, it is possible that training data from a "non-subject" data set can be drawn upon and used to find solutions. This reduces the number of training data required and, if necessary, also the training time. While transfer learning has its origins in image recognition, BERT uses this methodology for text processing, since search queries are very individual and specific training data is not always available.

How is the language model structured and what functions does it include?

The BERT language model is based on calculation models, so-called transformers, which place a word in relation to all other words in a sentence. and thus tries to better understand the meaning. The transformers function in such a way that input signals are converted via so-called encoders into a processable form of vectors with which mathematical operations can be carried out. In the so-called "self-attention layer", each word of the input is weighted according to a value scale. This value scale evaluates each word in relation to the other words in the input. The values are then normalised and weighted using the so-called softmax function in such a way that the sum of all values adds up to 1. They are then passed on to the next layer.

Both the encoders and the decoders are designed as Feed-Forward-Neural-Network constructed. This means that there is no feedback to previous layers within the neural networks, as is the case with recurrent networks. In the decoder, a self-attention layer is applied, the values are normalised and the processed input data are merged in the so-called encoder-decoder-attention layer. Afterwards, a neural feed-forward network is implemented and a linearisation of the values and the softmax function are applied in order to finally output the most probable solution.

BERT also works like most algorithms on the basis of probabilitieswhich is used as a basis for finding a solution.

Black Box

What is a black box?

As a black box each system of the deployed artificial intelligence Designates whose inputs and operations are not visible to the user. In general, a black box is an impenetrable system.

At Deep Learning black-box development is usually performed. So the algorithm takes millions of data points, processes that input and correlates certain data features so it can produce an output. In the Data mining on the other hand, it is an algorithm or even a technology that cannot give any explanation for how it works.

A black-box model for developing software with artificial intelligence is an adequate development model for testing software components. This is not the case with search algorithms, Decision trees and knowledge-based systems that have been developed by AI experts, are transparent and offer comprehensible solution paths, from white-box processes.

A black box in the Machine learning is a model of a purely statistical nature. White-box models, on the other hand, denote analytical and physical descriptions for which modelling is often very elaborate. Finally, grey-box models combine both approaches and can unite the respective advantages.

What are typical methods?

A Black box testing is always used when there is no knowledge of the inner workings and implementation of the software. Only the outwardly visible behaviour is included in the test.

A successful test is not a sufficient indication of a successful and error-free system. Thus, a non-requested functionality or a massive security gap may remain undetected. Therefore, one test procedure is usually not sufficient, since structural tests cannot detect missing functionality and functional tests only insufficiently consider the existing implementation. The best approach is a combined procedure of functional testing with limit analysis or random testing, structural testing of the sections that were not covered and regression testing after error correction.

Functional tests can only insufficiently consider the implementation at hand. Test methods include functional tests (black box test) with a test case selection based on a specification. Thus, equivalence class tests are carried out, limit values are calculated and the test is narrowed down via special values. State tests can be implemented on this specification basis.

Bag-of-words model

What is a Bag-of-Words model?

A bag-of-words model is a simplifying representation used in natural language processing and information retrieval. In this model, a text is represented as a bag of its words, without taking into account grammar and even word order, but maintaining multiplicity.

One application of this artificial intelligence is email filtering. The number of identical words is stored. These must be the words with the highest number of occurrences, not the most important words, because "the", "the", "the" and "a", "one" frequently occur without these words having much relevance. For the purpose of classification, supervised alternatives are developed to yield a class label of a document.

There is a bigram model in which the text is parsed into units. Hashing can also be used to save memory. Further there is a Bayes spam filterwhere the email message is split into an unordered collection of words from two probability distributions. One represents spam and the other represents legitimate emails, so-called "ham". Thus, there are two bags of words. One bag is filled with words present in spam messages and the other with words present in legitimate emails.

What is Bag-of-Words?

Bag-of-words is a certain way to extract features from a text that are used to model this text with machine learning algorithms. The approach is very simple and flexible. It can be used in many ways to extract features from a document.

A bag-of-words is a representation of text that describes the frequency of words within a document. On the one hand, there is a vocabulary of known words, and on the other hand, there is a measurement of existing known words. This model is called a bag because the order or structure of the words is omitted. It only looks at whether a word occurs, but not where it is in the document.

How is text converted to vectors?

Language modelling and document classification can easily be done using bag-of-words models. Machine learning cannot work directly with the plain text, but a conversion to numbers is done. By counting word occurrences and hashing, sentences can be converted into vectors. Bag-of-words is one of the best-known methods used to construct feature spaces. Feature vectors are generated in the course of this procedure.

Binary tree

What is a binary tree?

A binary tree is a special graph in the form of a branching tree. Binary trees have the special feature that their nodes always have at most two descendants. These divide systematically into a left and a right subtree.

With this method, files are stored in a Database placed and found. The algorithm finds data by repeatedly halving the number of ultimately accessible records until only one remains.

How does a binary tree work?

In a tree, records are stored at locations called leaves. This name derives from the fact that records are always present at the end points; there is nothing beyond. The branching points are called nodes. The order of a tree results from the number of branches (called children) per node. In a binary tree there are always two children per nodeThe number of leaves in a binary tree is always a power of 2. The number of access operations required to reach the desired data set is called the depth of the tree.

In a practical tree, there can be thousands, millions or billions of records. Not all leaves necessarily contain a record, but more than half do. A leaf that does not contain a record is called a zero.

One of the best-known programming languages is Java. In Java, a binary search tree has a node-based binary tree data structure that has the following properties: The left subtree of a node contains only nodes whose key is smaller than the key of the parent node and the right subtree, on the other hand, contains only nodes with a larger key.

What are the applications of binary trees?

There are different types of binary trees, with unique characteristics...:

  • Binary search trees represent the most relevant representatives of the binary trees. The nodes in these trees are arranged linearly according to their key. With their help, more efficient searches can be implemented in practice.
  • Full binary trees are a special type of binary tree that has either no children or two children. This means that all nodes in this tree either have two child nodes of the parent node or the parent node itself is the leaf node or the external node. In other words, a full binary tree is a unique tree in which every node has two children except the external node. Even if this has only one child, such a binary tree is not a full binary tree. Here, the number of leaf nodes is equal to the number of internal nodes +1. The equation is: L=I+1, where L is the number of leaf nodes and I is the number of internal nodes.
  • A complete binary tree has completely filled all levels of the tree with nodes. An exception to this is the lowest level of the tree. Also in the last or lowest level of this tree, every node should be on the left side if possible.
  • Partially ordered binary trees are characterised by the fact that their roots always represent a minimum for the node of the subtree. In the direction of their leaves, the values of the nodes increase or at least remain at the same level.
  • A Binary tree is considered "perfect denotes when all internal nodes have exactly two children and each external or leaf node is at the same level or depth within a tree. A perfect binary tree with height 'h' has 2h - 1 nodes.
  • If tree height O(logN), where "N" is the number of nodes, is present, from balanced binary trees spoken. In these, the height of the left and right subtree of each node should vary by at most one. An AVL tree and a red-black tree are some common examples of data structures that can produce a balanced binary search tree.
  • At degenerated or pathological binary trees each internal node has only one child. Such trees resemble a linked list in their performance.