How Reinforcement Learning Works

by | 25. January 2021 | Tech Deep Dive

Reinforcement Learning (RL) is an increasingly popular Machine Learning method that focuses on finding intelligent solutions to complex control problems. In this blog article, we will explain the basic workings of the method before presenting the concrete potential of Reinforcement Learning in two subsequent articles.

Reinforcement Learning can be used for entirely practical purposes. Google, for example, uses it to control the air conditioning in its Data centers with impressive results: “The adaptive algorithm was able to reduce the energy required to cool the servers by around 40 percent.” (Source: But how does Reinforcement Learning work?

What is Reinforcement Learning?

Generally speaking, the overarching discipline of Machine Learning is primarily divided into unsupervised Machine Learning and supervised Machine Learning. RL is considered to be a third method of Machine Learning to go along with the other two mentioned above. Unlike the other methods however, Reinforcement Learning does not require any Data in advance. Instead, Data is generated and labelled in a simulation environment over the course of many runs in a trial-and-error process during training.

Reinforcement learning is a promising method on the way to general Artificial Intelligence

One key result is that Reinforcement Learning can enable a form of Artificial Intelligence capable of solving complex control problems without prior human knowledge. Compared to conventional engineering, such tasks can be solved many times faster, more efficiently and, ideally, even optimally. RL is described by leading AI researchers as a promising method for achieving Artificial General Intelligence.

In brief, this term refers to the ability of a machine to successfully perform any intellectual task. Similarly to the way a human approaches a problem, machines need to observe and learn from various causalities in order to solve unfamiliar problems in the future.

One way to emulate this learning process is the trial-and-error method. In other words, Reinforcement Learning replicates the learning behavior of trial-and-error found in nature. Thus, the learning process has links to methods in psychology, biology and neuroscience.

Theory: How Reinforcement Learning works

Reinforcement learning covers a broad array of individual methods in which a software agent independently learns a strategy. The goal of the learning process is to maximize the number of rewards earned within a simulation environment. During training, the agent performs actions within this environment at each time step while receiving an appropriate feedback for each individual action. The software agent is not informed in advance which actions are best suited to which situation. Rather, it receives a reward at certain points in time. During training, the agent thus learns to assess the consequences of actions in the simulation environment. On this basis, it can develop a long-term strategy to maximize rewards.

Reinforcement Learning Model
The figure shows an iteration loop and illustrates the interaction of the individual components in Reinforcement Learning.

The goal of Reinforcement Learning: find the best possible policy

Simply put, a policy is the learned behavior of a software agent. A policy specifies which action should be executed for any given behavioral variant (Observation) from the learning environment in order to maximize the rewards.

How can such a policy be mapped? One way to achieve this is through the use of a so-called Q-table. A table is created with all possible Observations as rows and all possible Actions as columns. The cells are then filled in with values representing the expected future reward for each Observation and Action during training.

However, Q-tables also have their limitations: they only work if the Action and Observation space remains small. That is, if the number of possible behavioral patterns and action options are small. In cases where the software agent needs to evaluate a large number of features, or features with continuous values, a neural network is necessary to map these values. A common method for this is Deep Q-Learning.

In our blog article on Deep Learning, we not only explain the method, but also show how it is being put to practical use.

In detail, the neural network is defined with the features of the Observation space as its Input Layer and the Actions forming the Output Layer. The values are then acquired and stored in the individual neurons of the network during training.

Reinforcement Learning in a nutshell and the great potential of the method

Reinforcement Learning is, in essence, about learning by interacting with an environment. The key to solving reinforcement problems is to find optimal policy or value functions. The representation of a policy and the choice of which Reinforcement Learning method to use is dependent on the specific problem at hand.

In the next blog article on Reinforcement Learning, we will look at the current state of research and the challenge of producing Artificial General Intelligence. As mentioned earlier, RL plays a key role in this endeavour. The enormous potential of the method explains the great attention it is currently receiving.

Data Navigator Newsletter

Next Webinar


Data Navigator Newsletter