Redundancy (information theory)

What is redundancy?

The term redundancy comes from the Latin word "redundare" and means "overflowing" or "present in excess". In computer science, redundancy refers to Excess data, the absence of which would not create a loss of information. Basically, a distinction is made between intended and unintended redundancy.

What are examples of redundancies in computer science?

Information transmission

In the transmission of information and messages, the Redundancy for the detection of errors. The part of the message that does not contain any relevant information is marked as redundant. It is therefore Additional bits that, for example, represent functions in the message can. Higher redundancy also allows errors to be corrected. Information lost in a transmission can be restored under certain circumstances. However, this depends on the fault tolerance of the application. For example, IP telephony is more fault-tolerant than transactions at a bank. Error tolerance is measured by the Hamming distance. This can be used to determine differences between character strings. For example, binary coded numbers are compared with each other by XOR operation and the deviating digits are counted.

The redundancy of the code is calculated from the difference between the average source code word length L(C) and the entropy H(X) of the information.

The redundancy of the source is determined from the difference of maximum entropy H_max(X) and entropy H(X).

Coding

In coding theory, one divides into distribution redundancy and binding redundancy. The Distribution redundancy refers to the different probabilities of occurrence of characters of an alphabet. Binding redundancy on the other hand, means that certain characters are more likely to occur after certain other characters. For example, the letters "c" and "h" have a lower occurrence than other characters, but when they do occur, it is usually as a combination.

The aim of source coding is to eliminate superfluous data in order to make maximum use of the information channel. However, relevant information of a message must be preserved. A Example of a low-redundancy coding is the Huffman coding. Here, characters that occur more frequently in a source are represented by fewer bits than rarer symbols. With the help of a code tree, the characters are assigned to their code words. Decoding is done bit by bit, starting at the root. This enables lossless compression and transmission.

Databases and database structures

In database systems redundancies are undesirable, as they lead to data anomalies. If several identical data sets exist, it may not be clear which data should be accessed. It also complicates the consistency and maintenance of the Database. In addition, redundant data can consume a lot of storage space.

An example is the contact details of a person when buying from an online shop. If name, address and customer number occur with every order, these are redundant data records.

Through Normalisation of database schemas excess information is reduced. Relational database systems represent data in tables. Data sets from different tables can be linked to each other by their attributes. In normalisation, the data is put into atomic form and each table column is constructed to contain similar values. In addition, all non-key attributes must be independent of the primary key.

However, sometimes redundant data in a database is necessary, such as Key redundancies. Keys are identifiers that uniquely identify data sets. Redundant information is also deliberately preserved when the effort of normalisation would be too great. A Denormalisation then serves to improve the running time.

Redundancy (information theory)

What is redundancy?

What are examples of redundancies in computer science?

Information transmission

Coding

Databases and database structures

Search

Most read

Career

Data Navigator Newsletter