Decision tree

What is a decision tree?

A decision tree represents a Procedure for Classification From objects on the basis of which decisions can be made. Decision trees are mainly used in the field of stochastics for the representation of conditional probabilities, but also in the area of machine learning and in decision theory.

The graphical illustration of decision trees can be implemented with so-called tree diagrams, which, starting from the root node up to the leaf, can clearly display all decision options.

The clear interpretability and the transparency of the presentation options are among the advantages of decision trees. In addition, they can be expanded relatively flexibly. The good clarity of small trees can, however, turn into the opposite for human observation in the case of large and complex trees. Large trees, especially with a great depth, can become confusing relatively quickly and increase the computational effort for analysis. Furthermore, it is often difficult to represent and evaluate all attributes or decision options of each individual node.

Methodology

Every decision tree always has its origin in the root node mentioned above. Starting from the root node, the tree branches out to further nodes, which finally end in a leaf or the final classification. A distinction is made between decision nodes and chance nodes. Decision nodes are shown as rectangles in the graphical illustration and represent a query about an attribute that decides on the further course along the tree and the subsequent node.
Chance nodes are represented as a circle and describe the further course of the tree with a certain probability. The end of the decision tree is called a leaf, often represented as a triangle, and represents the result of the respective classification task. If each node of the tree has exactly two attributes or two possible branches, it is called a binary decision tree. Each decision tree can also be represented as a binary tree.

In the context of the analysis or classification of data objects, the process always starts at the root node. Subsequently, an input vector is queried for each decision node and compared with the attributes of the decision node. If the vector matches an attribute, this represents the path to the next subsequent node in the decision tree.
In the case of random nodes, the further path and thus the next successor node is determined on the basis of the existing or assumed probabilities. According to this methodology, the process continues until the end of the decision tree or the last leaf is reached. This immediately represents the result of the tree or the respective classification of the data object. This can be a monetary, numerical, cardinal or nominal value, which can be recoded beforehand for better mathematical processing if necessary.

Example of a decision tree in machine learning

A concrete example of the application of decision trees in the field of machine learning is the methodology of the Random Forest represents. In this algorithm, founded by the statistician Leo Breiman, decision trees are used to solve Classification and regression tasks applied.

As the name Random Forest suggests, not just one decision tree is used, but a large number. This creates a "forest" of decision trees. With Random Forest, the trees are generated randomly and uncorrelated. During the creation of the trees, so-called Bagging care was taken to ensure that the individual trees do not correlate with each other and thus negatively influence the quality of the result.

The input data is applied to several decision trees when applying the algorithm for the prediction by the so-called ensemble method or ensemble learning in order to achieve a high-quality result. This procedure has the advantage over considering only a single decision tree that outliers in the individual trees are counteracted and the result thus corresponds to the average of the individual trees.

Efficiency of an algorithm

What is the efficiency of an algorithm?

An algorithm is then particularly efficient if it requires as few resources as possible, i.e. hardly any computing time and memory, to solve a specified problem. The efficiency of an algorithm can be measured in various ways: as a rule, the runtime efficiency and the storage efficiency are used as the most important key figures. This can be measured particularly well in absolute and relative terms and can therefore also be interpreted and evaluated.

In a concrete implementation of a particular programming language, efficiency can be determined as a function of hardware and the input data. However, it is also possible to create an efficiency evaluation independent of the evaluation variable and the implementation. The Landau notation can be used to determine the efficiency of an algorithm.

On what concept is this efficiency based?

For precise mathematical efficiency analyses, the underlying data and a calculation model are needed. It is clarified which data types exist, how the data is stored and which operations are permitted. Then it is determined what time a certain operation on a defined data type requires. A random access machine with simple but "unlimited" memory is used for this purpose. Efficiency is represented as storage efficiency (storage effort), subjective efficiency (acquisition effort) and runtime efficiency (time effort).

Algorithms can be fully analysed in terms of correctness and efficiency. Sorting, searching and optimising can also be considered. Data structures have different efficiencies in storage and access. Decisive for the quality of an algorithm is above all the consideration of large inputs. For practical use, it must always be considered whether the input is large or rather small. In theory, however, asymptotic behaviour is important.

Efficiency is specified independently of the machine in the number of calculation steps. One can break this down according to the number of comparisons and additions or other arithmetic operations. The number of memory locations can also be specified. Thus, the concept of the efficiency of an algorithm can be considered machine-independent. In theory, a Turing machine is often used that has a tape on which input and output can be recorded. The tape is infinitely long and the individual calculation steps are considered.

Edge Case

What is an edge case?

An edge case is also called a boundary case or edge case. It describes a scenario in which a parameter is in an extreme situation. These extremes can be either in the minimum or maximum range.

In most cases, this results in problems and malfunctions, so an edge case should be found, tested and fixed. Unit tests are usually used for this in software development. These test the boundary conditions of a method, a function or algorithms.

By controlling the correct operation within the extremes (the edges), it can be assumed that the states in between will also work. This means that not every case that might occur needs to be controlled. If, for example, a function is to add 2 numbers, it is sufficient to test it with numbers that are as small and as large as possible. If everything works without errors, it can be assumed that the function will also work without problems for all numbers in between. This approach keeps the process of testing as efficient as possible.

What is the difference between an Edge Case and a Corner Case?

In contrast to the Edge Case, where an extreme situation only occurs with one parameter, the Corner Case defines a situation where several conditions are in extreme ranges (minimum or maximum).

The difference therefore lies in the number of operating parameters, which are at the extremes and thus cause problems.

An example of an edge case: A printer has a low ink level and no longer prints the text cleanly. The ink level property is at the minimum and the printer cannot do its job properly.

In the corner case, on the other hand, several conditions are in the extreme: the printer has a low ink level, the brightness is set to maximum and the printer is standing on a shaky surface. It is only the interaction of these 3 parameters that causes the printer not to function properly. If each condition occurred individually, the printer would have no problem. The initial situations in both cases are therefore different, but they usually lead to the same result, namely a malfunction. Since it is often not clear whether one parameter alone or several in combination cause a problem, edge case and corner case are often used as synonyms for an error that occurs.

Entropy (information theory)

What is entropy?

In information theory, an entropy is a measure that indicates an average information content of the output messages for a certain message source. The information-theoretical understanding of the term entropy goes back to Claude Shannon. In general, the more characters received from a given source, the more information is collected. Entropy, according to Shannon's original intention, should be used as the measure of a required bandwidth of a transmission channel. However, he generalised the findings and devised a state of entropy that is generally accepted as the measure of information content. If this is small, then the information text contains many redundancies or even statistical regularities.

The main branches of Shannon's information theory include the encoding of information, the quantitative measure that applies to the redundancy of a text, data compression and cryptography. Information theory is a theory that aims to quantify and qualify the information content of a data set.

What aspects arise in computer science?

Shannon understood the Entropy in computer science as a measure of information and thus he could combine thermodynamics with information theory. This resulted in new aspects and methods:

  • The cross entropy is a measure of model quality. It calculates the total entropy between the distributions. The cross entropy is usually used in the Machine learning used as a loss function. The cross entropy can be understood as a measure that originates from the field of information theory and is based on entropy states.
  • The Kullback-Leibler divergence is a certain distance measure between two different models. The intrinsic measure of difficulty and quality is applied in machine learning.
  • The entropy change (information gain) is used as a criterion in feature engineering.
  • Cross-entropy minimisation is used as a method of model optimisation.
  • Another role is played by conditional entropy, which clarifies how much entropy remains in one random variable given knowledge of another random variable.
  • The joint entropy makes statements about how many bits are needed to correctly encode both random variables.
  • Ultimately, there is a whole entropy algebra where one can calculate back and forth between marginal, conditional and joint entropy states.

How is entropy used in machine learning?

Entropy in machine learning is the most commonly used measure of impurity in all of computer science. The entropy state is maximum when two classes reach 1.00 and these classes occur within a superset with identical frequency. If one class gains quantitative dominance, the probability of such a class increases equally and the entropy state decreases. In machine learning, entropy tells us how difficult it is to predict an event.

Evolutionary algorithm

What is an Evolutionary Algorithm?

An Evolutionary Algorithm is based on the ideas from biological evolution and its developmental pattern. It is an optimisation procedure that can find new approaches and solutions to specific problems. Evolutionary algorithms are also called genetic algorithms.

Such a method is ideally suited for more complex problems. In this way, whole individuals can be created in a simulation and these receive a so-called fitness function with fitness values. This fitness value provides information about how well such an individual can solve the corresponding problem. Afterwards, selection is carried out, whereby the best adapted individuals "survive".

In the whole process, two important elements are considered prerequisites for improvement, namely recombination and mutation. The strongest properties are secured by the process and the AI approaches the optimal solution. Changes become smaller and smaller and may even occur as soon as the maximum is reached. This process can be manually interrupted beforehand to avoid the problem of over-adaptation.

What is the difference between Evolutionary Algorithms and Backpropagation?

Evolutionary algorithms are an alternative to Backpropagation. In backpropagation, a gradient method is used to continuously improve the results. Error analysis and root cause analysis are used. Afterwards, further and further adjustments are made. Evolutionary algorithms behave much more "randomly" when the weighting parameters are changed.

Backpropagation is used in the context of supervised learning. Evolutionary algorithms, on the other hand, can also be used in unsupervised learning.The results are used in pattern recognition and reinforcement learning.

Where do evolutionary algorithms come into play?

Evolutionary algorithms can be used beyond optimisation and search. They are used in art, modelling and simulation. Other fields of application are the development of sensor networks, stock market analysis and the prediction of RNA structures.

In combinatorics, an optimal solution cannot always be derived due to the high computational effort. Heuristics are often used for such optimisations. Important for the use of evolutionary algorithms are the fitness function, mutation and variation, with which generations of new elements can be calculated.