What is a Convolutional Neural Network (CNN)?
A Convolutional Neural Network (CNN) is a neural networkin which the activity of each individual artificial neuron is calculated via the so-called convolution. Convolution is a mathematical operator that calculates a third function from two functions. This result can be seen as the mathematical product of the two functions. The concept or algorithm is mainly used in the field of image recognition, the Speech recognition as well as in so-called reinforcement learning.
While image and speech recognition belong to the area of the Natural Language Processing (NLP)Reinforcement learning is a method of machine learning that describes how the system is supposed to process human language. Here the system is to independently learn a strategy through interaction with the environment, which can contribute to problem solving. Positive/correct behaviour is rewarded, negative/wrong behaviour is punished.
The basic idea of a Convolutional Neural Network was modelled on the processes in the human brain, more precisely the visual cortex. Such CNNs can be created, for example, in the programming language Python with the help of the programme libraries Keras or PyTorch, which provide suitable libraries for the creation of this and an interface to the framework. TensorFlow offer. Also with Matlab in conjunction with Deep Learning Toolbox, CNNs can be created, trained and applied.
How is a CNN structured?
The The basic architectural principle of a Convolutional Neural Network is a convolutional layer followed by a pooling layer.. These two layers can be repeated as often as desired when calculating the output. The model is completed by a fully-connected layer.
At first step, the input is converted into matrix form. (using the example of an image in the form of a three-dimensional matrix with the dimensions width, height and number of colour channels). In the convolutional layer, the activity of the individual neurons is calculated via the described convolution. The input variables determined in this way are converted into an output via an activation function, as in a conventional neural network. Normally, a rectifier (ReLu function (Rectified Linear Unit)) or the sigmoid function is used for this. Both are non-linear functions and defined as the positive part of their argument. This means that the output is always greater than or equal to zero. Both activation functions are implemented in the Keras or PyTorch libraries, for example, and can be called via them. In the field of image recognition, the convolutional layer has the task of recognising edges, shapes or other features.
In the The next layer of the Convolutional Neural Network, the pooling or subsampling layer, discards superfluous information to reduce data and improve performance.. For example, for image recognition, an approximate localisation of objects is completely sufficient, whereas the exact position is not of much further use. Applicable methods for this are, for example, max-pooling or mean-pooling. In this discretisation process, only the most active/maximum neurons in defined areas of neurons of the convolutional layer (e.g. 2×2 matrix) are used for further processing, or the "most average neurons" in the case of mean value pooling. In a biological sense, the two layers can be compared to areas on the retina of the human eye. There, in the receptive field, a convergence of a large number of rods and cones onto a few so-called ganglion cells and a reduction of the flood of information and stimuli (lateral inhibition) for further processing takes place.
At The last step, the fully-connected layer, is used to classify the data.. This layer often does not differ from a multilayer perceptron of a conventional neural network. The classification or transformation into a one-dimensional vector is normally done by a softmax and normalisation function, whereby the data finally flows into a probability distribution.
Where are Convolutional Neural Networks used?
Convolutional neural networks are mainly used for in the field of image recognition, Text recognitionspeech recognition and reinforcement learning. For example, the game uses AlphaGo (a computer version of the board game Go) CNNs to localise game pieces.
Also Convolutional neural networks can be used to detect damage in the event of natural disasters.by detecting anomalies in environments or damage to house roofs, for example. The same applies in the medical field for cancer cell detection.
In text recognition, CNNs are used for handwriting recognition. but also for the detection of relevant features in images such as the postcode or contract number in a document or for vehicle number plate recognition.