What is entropy?

In information theory, an entropy is a measure that indicates an average information content of the output messages for a certain message source. The information-theoretical understanding of the term entropy goes back to Claude Shannon. In general, the more characters received from a given source, the more information is collected. Entropy, according to Shannon's original intention, should be used as the measure of a required bandwidth of a transmission channel. However, he generalised the findings and devised a state of entropy that is generally accepted as the measure of information content. If this is small, then the information text contains many redundancies or even statistical regularities.

The main branches of Shannon's information theory include the encoding of information, the quantitative measure that applies to the redundancy of a text, data compression and cryptography. Information theory is a theory that aims to quantify and qualify the information content of a data set.

What aspects arise in computer science?

Shannon understood the Entropy in computer science as a measure of information and thus he could combine thermodynamics with information theory. This resulted in new aspects and methods:

  • The cross entropy is a measure of model quality. It calculates the total entropy between the distributions. The cross entropy is usually used in the Machine learning used as a loss function. The cross entropy can be understood as a measure that originates from the field of information theory and is based on entropy states.
  • The Kullback-Leibler divergence is a certain distance measure between two different models. The intrinsic measure of difficulty and quality is applied in machine learning.
  • The entropy change (information gain) is used as a criterion in feature engineering.
  • Cross-entropy minimisation is used as a method of model optimisation.
  • Another role is played by conditional entropy, which clarifies how much entropy remains in one random variable given knowledge of another random variable.
  • The joint entropy makes statements about how many bits are needed to correctly encode both random variables.
  • Ultimately, there is a whole entropy algebra where one can calculate back and forth between marginal, conditional and joint entropy states.

How is entropy used in machine learning?

Entropy in machine learning is the most commonly used measure of impurity in all of computer science. The entropy state is maximum when two classes reach 1.00 and these classes occur within a superset with identical frequency. If one class gains quantitative dominance, the probability of such a class increases equally and the entropy state decreases. In machine learning, entropy tells us how difficult it is to predict an event.