How reinforcement learning works

from | 11 April 2019 | Tech Deep Dive

Reinforcement Learning (RL) is an increasingly popular machine learning method that focuses on finding intelligent solutions to complex control problems. In this blog article, we explain how the method works in principle and then show the concrete potential of reinforcement learning in two subsequent articles.

Reinforcement learning can be used for very practical purposes. Google, for example, uses it to control the air conditioning in its data centres and was able to achieve an impressive result: "The adaptive algorithm was able to reduce the energy needed to cool the servers by around 40 percent". (Source: But how does reinforcement learning work?

What is Reinforcement Learning?

Translated, reinforcement learning means something like reinforcement learning or reinforcement learning. reinforctive learning. In general terms, machine learning can be divided into Unsupervised Machine Learning and Supervised Machine Learning. RL, in addition to the two methods mentioned, is considered to be One of the three machine learning methods.

In contrast to the other two methods, reinforcement learning does not require any data in advance. Instead, they are generated and labelled in a simulation environment in many runs in a trial-and-error process during training.

Reinforcement learning is a promising method on the way to general artificial intelligence

As a result, reinforcement learning makes a form of artificial intelligence possible that can be used without prior human knowledge. Solve complex control problems can. Compared to conventional engineering, such tasks can be solved many times faster, more efficiently and, in the ideal case, even optimally. By leading AI researchers, RL is seen as a promising method for achieving Artificial General Intelligence designated.

In short, it is the Ability of a machine to successfully perform any intellectual task to be able to do so. Like a human being, a machine must observe different causalities and learn from them in order to solve unknown problems in the future.

If you are interested in the distinction between Artificial Intelligence, Artificial General Intelligence and Machine Learning Methods read our basic article on the topic "KI".

One way to replicate this learning process is the method of "Trial and error. In other words, reinforcement learning replicates the learning behaviour of trial-and-error from nature. Thus, the learning process has links to methods in psychology, biology and neuroscience.n on.

Theory: How reinforcement learning works

Reinforcement Learning stands for a whole Series of individual methods, where a software agentt independently learns a strategy. The goal of the learning process is to maximise the number of rewards within a simulation environment. During training, the agent performs actions within this environment at each time step and receives feedback.

The software agent is not shown in advance which action is best in which situation. Rather, it receives a reward at certain points in time. During training, the agent learns to assess the consequences of actions on situations in the simulation environment. On this basis, he can make a Long-term strategy develop to maximise the reward.

Reinforcement Learning Model
The figure shows an iteration loop and illustrates the interaction of the individual components in reinforcement learning

The goal of reinforcement learning: to find the best possible policy

Simply put, a policy is the learned behaviour of a software agent. A policy specifies which action should be taken for any given behavioural variant (Observation) from the learning environment (Enviroment) is to be executed in order to obtain the reward (Reward) to maximise.

How can such a policy be mapped? For example, a so-called Q-Table can be used. A table is built with all possible observations as rows and all possible actions as columns. The cells are then filled with the so-called value values during training, which represent the expected future reward.

However, using the Q-table also has its limitations: it only works if the action and observation space remains small. That is, if the options for action and the possibilities for behaviour are small. If many features or even features with continuous values are to be evaluated by the software agent from the environment, a Neural network necessary to map the values. A common method for this is Deep Q-learning.

In our blog article on the topic Deep Learning we not only explain the method, but also show how it is applied in practice.

In detail, the neural network is combined with the features of the Observation Spaces defined as the input layer and with the actions as the output layer. The values are then learned and stored in the individual neurons of the network during training.

Reinforcement Learning in a nutshell and the great potential of the method

Reinforcement learning is essentially about learning through interactions with an environment. The key to solving reinforcement tasks is to, find optimal policy or value functions. The representation of a policy and the reinforcement learning method to be used depends specifically on the problem to be solved.

In the next blog article on reinforcement learning, we will look at the current state of research and the challenge of producing artificial general intelligence. As already mentioned, RL plays a key role in this. The enormous potential of the method explains the great attention it is currently receiving.



Christian Lemke specialises in machine learning and artificial intelligence. He is involved in the development of machine learning pipelines and the development, evaluation, scaling and implementation of models. In his academic training, he focused on application-oriented data science, machine learning and big data.


Submit a Comment

Your email address will not be published. Required fields are marked *