Reinforcement learning (RL) is an increasingly popular machine learning method that focuses on finding intelligent solutions to complex control problems. In this blog article, we explain how the method works in principle, before going on to demonstrate the specific potential of reinforcement learning in two follow-up articles.
Reinforcement learning can be roughly translated as “reinforcing learning” or “reinforcing learning.” In general terms, machine learning is divided into unsupervised machine learning and supervised machine learning. Alongside these two methods, RL is considered one of the three methods of machine learning.
Unlike the other two methods, however, reinforcement learning does not require any data in advance. Instead, data is generated and labeled in a simulation environment in many trial-and-error runs during training.
As a result, reinforcement learning enables a form of artificial intelligence that can solve complex control problems without prior human knowledge. Compared to conventional engineering, such tasks can be solved many times faster, more efficiently, and, in the ideal case, even optimally. Leading AI researchers consider RL to be a promising method for achieving artificial general intelligence.
In short, this is the ability of a machine to successfully perform any intellectual task. Like a human being, a machine must observe different causalities and learn from them in order to solve unknown problems in the future.
If you are interested in the difference between artificial intelligence, artificial general intelligence, and machine learning methods, read our basic article on the topic of “AI.”
One way to replicate this learning process is the “trial and error” method. In other words, reinforcement learning replicates the learning behavior of “trial and error” found in nature. The learning process thus has connections to methods in psychology, biology, and neuroscience.
Reinforcement learning refers to a whole range of individual methods in which a software agent independently learns a strategy. The goal of the learning process is to maximize the number of rewards within a simulated environment. During training, the agent performs actions within this environment at each time step and receives feedback.
The software agent is not shown in advance which action is best in which situation. Instead, it receives a reward at certain points in time. During training, the agent learns to assess the consequences of actions on situations in the simulation environment. On this basis, it can develop a long-term strategy to maximize the reward.
The figure shows an iteration loop and illustrates the interaction of the individual components in reinforcement learning
Put simply, a policy is the learned behavior of a software agent. A policy specifies which action should be performed for any behavior variant (observation) from the learning environment (environment) in order to maximize the reward (reward).
How can such a policy be mapped? A so-called Q-table can be used for this purpose. This is a table with all possible observations as rows and all possible actions as columns. During training, the cells are then filled with so-called value values, which represent the expected future reward.
However, the use of the Q-table also has its limitations: it only works if the action and observation space remains small. This means that the options for action and behavior are limited. If many features or features with continuous values are to be evaluated by the software agent from the environment, a neural network is required to map the values. A common method for this is deep Q-learning.
In our blog article on deep learning, we not only explain the method, but also show how it is used in practice.
In detail, the neural network is defined with the features of the observation space as the input layer and the actions as the output layer. The values are then learned and stored in the individual neurons of the network during training.
When it comes to the practical application of reinforcement learning, the first step is to understand the problem correctly. Reinforcement learning is not the right solution for every task. In fact, there are probably more use cases where other methods are more suitable than reinforcement learning. A use case workshop is a good way to find out which method is best suited to which use case.
To find out whether reinforcement learning is suitable for a specific problem, you should check whether your problem has some of the following characteristics:
Before an algorithm works, many iterations are required. This is partly because there may be delayed rewards that first need to be found. The learning process can be modeled as a Markov decision process (MDP). This requires the design of a state space, an action space, and a reward function.
Such a simulated learning environment must fulfill an important prerequisite: it must be able to reflect the real world in a simplified form. To achieve this, three points must be considered:
Reinforcement learning is an iterative process in which systems can learn rules on their own from an environment designed in this way.
Reinforcement learning is ideally suited for use when a specific goal is known but the solution is not yet known. For example: A car should independently find the optimal route from A to B without causing an accident. In contrast to traditional engineering methods, however, the solution should not be specified by humans. A new solution should be found with as few specifications as possible.
One of the great advantages of reinforcement learning is that, unlike supervised machine learning and unsupervised machine learning, no special training data is required. In contrast to supervised machine learning, new and unknown solutions can emerge instead of just imitated solutions based on the data. It is possible to achieve a new optimal solution that is unknown to humans.
Anyone who wants to rely on reinforcement learning must be aware that it comes with a number of challenges. First and foremost, the learning process itself can be very computational. Slow simulation environments are often the bottleneck in projects involving reinforcement learning.
In addition, defining the “reward function” – also known as reward engineering – is not trivial. It is not always clear from the outset how the rewards are to be defined. Furthermore, optimizing the many parameters is very complex. Defining the observation and action space is also sometimes difficult.
Last but not least, the dilemma of “exploration vs. exploitation” also plays a role in reinforcement learning. This means that the question of whether it is more worthwhile to break new ground or improve existing solutions arises time and again.
To give you a better feel for the possible applications of reinforcement learning, we have compiled a few real-world examples below. The following overview first shows the broad range of tasks as a whole. Reinforcement learning can be applied within the three categories of “optimization,” “control,” and “monitoring.”
The graphic provides an overview of the range of tasks covered by reinforcement learning.
Google is known for being at the forefront of AI development. Reinforcement learning also plays an important role in this. Google uses this method for direct current cooling. Background: Google operates huge data centers that not only consume enormous amounts of electricity, but also generate extremely high temperatures. A complex system of air conditioning units is used for cooling.
By using its adaptive algorithm, Google was able to reduce the energy costs for server cooling by 40 percent.
Reinforcement learning helps to control and manage this complex, dynamic system. There are significant safety restrictions and potential for considerable improvements in energy efficiency.
Our road network and traffic management system are also complex and extremely prone to disruption. Intelligent traffic light control is a major challenge in this context. Reinforcement learning is ideally suited to solving this problem. In the paper “Reinforcement learning-based multi-agent system for network traffic signal control,” researchers attempted to develop a traffic light control system to solve the traffic jam problem.
Sketch of a simulation environment with possible actions for the agent. (Image source: web.eecs.utk.edu/~itamar/Papers/IET_ITS_2010.pdf)
Due to its complexity, the logistics industry is ideally suited for reinforcement learning. This can be clearly illustrated by the example of inventory management. Reinforcement learning can be used, for example, to reduce the throughput time for inventory and product orders in order to optimize the use of available warehouse space.
Reinforcement learning is also used in fleet management. For many years, one of the main problems in this area has been the split delivery vehicle routing problem (SDVRP). In traditional route planning, a fleet with a certain capacity and a certain number of vehicles is available to serve a certain number of customers with known demand. Each customer must be served by exactly one vehicle. The goal is to minimize the total distance.
In the routing problem for split delivery vehicles (SDVRP), the restriction that each customer must be visited exactly once is now removed. In other words, split deliveries are allowed. Reinforcement learning can solve this problem so that as many customers as possible are served with only one vehicle.
Dynamic pricing is an ongoing and time-critical process in certain areas such as e-commerce. Reinforcement learning is key when it comes to creating a suitable pricing strategy based on supply and demand. This maximizes product sales and profit margins. Pricing can be trained using historical data on customer purchasing behavior, thereby providing suggestions for the product pricing process.
Reinforcement learning is particularly fascinating for a specific reason. The method has very close links to psychology, biology, and neuroscience. Similar to humans, algorithms can use this learning method to develop skills that resemble our own. The basic principle is always “trial and error.” This relatively simple principle can be used to solve complex control and optimization problems that are difficult to achieve with traditional methods.
Reinforcement learning is one of the most interesting and fastest-developing areas of research today. Its move into practical application is gaining momentum and could provide a decisive competitive advantage. With a suitable simulation environment and a reward system, reinforcement learning can lead to impressive results. Provided, that is, there is a suitable problem and an AI strategy into which reinforcement learning can be embedded.
Share this post: