Reinforcement learning in action - prerequisites and use cases

from | 24 April 2019 | Basics

Reinforcement learning is one of the most promising machine learning methods in the field of AI. After we looked at the state of research in the second part of the article series, in this third and final part you will learn everything you need to know about the practical use of reinforcement learning.

In recent years, enormous progress has been made in research in the field of reinforctive learning made. In the process, the Algorithms mainly through impressive victories in complex games and mastering simple robot tasks. Reinforcement learning, however, is suitable for use in a wide range of applications, especially for the control and optimisation of complex systems.

But there are still some challenges standing in the way of practical use at the moment. After all, no one wants to sit in a moving car in which a learning algorithm is currently in the Trial-and-error procedure tested the optimal solution for navigating in road traffic. The real world is full of unpredictable events, not fully observable and thus difficult to master. The potential for costly complications or even catastrophic accidents are great.

Basic prerequisite for the use of reinforcing learning

When it comes to the practical use of reinforcement learning, the first thing to understand is the question. Reinforcement learning is not the right solution for every task. In fact, there are probably more use cases where other methods are more appropriate than reinforcement learning. Which method fits which use case can be determined, for example, in a Use Case Workshop find out.

Get an overview of the practical uses of the most important Machine learning methods such as classification and clustering.

To find out whether reinforcement learning is suitable for a particular problem, you should check whether your problem has some of the following characteristics:

  • Is there a possibility to apply the principle of "Trial-and-Error" to apply?
  • Is your question a Control or Control problem?
  • Is there a complex Optimisation task?
  • Can the complex problem only be solved to a limited extent with traditional engineering methods?
  • Can the task be completed in one Simulated environment carry out?
  • Is a high-performance simulation environment present?
  • Can the simulation environment Influences and whose Status queried become?

Reinforcement learning is not a ready-made solution - the solution is approximated

Before an algorithm works, many Iterations required. This is partly because there can be delayed rewards and these must first be found. The learning process can be modelled as a "Marcov Decision Process" (MDP). For this, a State space, a Action area and a Reward function be designed.

Such a simulated learning environment must fulfil an important prerequisite: It must meet the tangible world can reflect in a simplified way. To do this, three points must be taken into account:

  1. A suitable RL algorithm with, if necessary, a neural network must be selected or developed.
  2. Define "iteration epochs" and a clear "goal".
  3. We need to define a set of possible "actions" that an agent can perform.
  4. Rewards" can be defined for the agent.

Reinforcement learning is an iterative process where systems can learn rules on their own from such a designed environment.

The advantages of reinforcement learning

Reinforcement learning can ideally be used when a particular Destination is known, but its solution is not yet known. For example: A car should independently get from A to B along the optimal route without causing an accident. Compared to traditional engineering methods However, the human being should not dictate the solution. A new solution will be found with as few specifications as possible.

One of the great advantages of Reinforcement Learning is that, unlike Supervised Machine Learning and Unsupervised Machine Learning no special training data is required. In contrast to Supervised Machine Learning can New and unknown solutions emerge, rather than just imitated solutions from the data. Achieving a new optimal solution unknown by humans is possible.

Reinforcement learning also faces some challenges such as intensive computing power and defining rewards

If you want to use reinforcement learning, you need to be aware that there are some challenges involved. First and foremost, the learning process itself can be very computationally intensive be. Slow simulation environments are often the bottleneck in Reinforcement Learning projects.

In addition, defining the "reward function" - also known as the Reward engineering is not trivial. It is not always obvious from the outset how the rewards are to be defined. Furthermore, the Optimise of the many Parameter very complex. Also the definition of observation and action space is sometimes not easy.

Last but not least, reinforcement learning also involves the dilemma of "Exploration vs. exploitation" play a role. This means that the question always arises as to whether it is more worthwhile to take new, unknown paths or to improve existing solutions.

Reinforcing learning in practice: sectors and concrete use cases

In order to get a better feel for the possible applications of reinforcement learning, we have included some more Examples from practice compiled. The following overview first shows the broad spectrum of tasks as a whole. Reinforcing learning can be classified within the three categories "Optimisation", "Control" and "Monitoring" can be applied.

reinforcing learning
The diagram gives an overview of the range of tasks of reinforcement learning.

Google controls the air conditioning with reinforcement learning

Google is known for being at the forefront of AI development. Reinforcement learning also plays an important role. Google uses this method in the Direct current cooling a. The background: Google operates huge data centres that not only consume an enormous amount of electricity, but also generate extremely high temperatures. To cool the data centres, a complex system of air conditioning used.

With this, Google was able, through the use of its adaptive algorithm, to Energy costs for server cooling by Reduce by 40 per cent.

Reinforcement learning helps to control and steer this complex, dynamic system. There are not insignificant Security restrictions and Potential for a significant improvement in the Energy efficiency.

Traffic light control in an intelligent traffic management system

Equally complex and extremely prone to disruption is our road network and the Traffic guidance system. Above all, the intelligent control of traffic lights is a great challenge. Reinforcement learning is ideally suited to solve this problem. In the paper "Reinforcement learning-based multi-agent system for network traffic signal control". researchers attempted to develop a Traffic light control to develop a solution to the congestion problem.

Simulation environment based on the example of a traffic management system
Sketch for a simulation environment with action options for the agent. (Image source:

Reinforcing learning in the logistics industry: inventory management and fleet management

The Logistics sector is excellently suited for reinforcement learning due to its complexity. This can be seen, on the one hand, in the example of Inventory management make clear. Reinforcement learning can be used, for example, to reduce the lead time for stock levels as well as ordering products for optimal use of the available space of the warehouse operation.

Reinforcement learning is also used in fleet management. Here, for many years, the aim has been to solve one of the main problems, the "Split Delivery Vehicle Routing Problem" (SDVRP). In the traditional Tour planning a fleet with a certain capacity and a certain number of vehicles is available to serve a certain number of customers with a known demand. Each customer must be served by exactly one vehicle. The aim is to Total distance minimise.

In the case of the routing problem with split, i.e. divided delivery vehicles (SDVRP), the restriction that each customer must be visited exactly once is now removed. Say: split deliveries are permissible. Reinforcement learning can solve this problem so that as many customers as possible are served with only one vehicle.

Reinforcement learning enables dynamic pricing in the retail industry

Dynamic pricing is an ongoing and time-critical process in certain sectors such as e-commerce. Reinforcement learning is key when it comes to creating an appropriate strategy for prices depending on supply and demand. This allows the Product turnover and Profit margins maximise. Pricing can be trained on the historical data of customers' buying behaviour to provide suggestions in the product pricing process.

Conclusion: Reinforcement learning has enormous potential for disruption

Reinforcement learning is particularly fascinating for a reason. The method has very close ties to psychology, biology and the neurosciences. Similar to us humans, algorithms can develop abilities similar to ours with this learning method. The Basic principle is always "Trial-and-Error". With this comparatively simple principle complex control and optimisation problems can be solved that are difficult to realise with traditional methods.

Reinforcement learning is currently one of the most interesting and rapidly developing Research areas. The step into the Practice is gaining momentum and can make the difference in competitive advantage. With a suitable simulation environment and a reward system, reinforcement learning can lead to impressive results. Provided there is a suitable question and AI strategy in which reinforcement learning can be embedded.



Christian Lemke specialises in machine learning and artificial intelligence. He is involved in the development of machine learning pipelines and the development, evaluation, scaling and implementation of models. In his academic training, he focused on application-oriented data science, machine learning and big data.