Why the future is agentic: An overview of Multi-Agent LLM Systems

Since the breakthrough of ChatGPT in November 2022, Generative AI (GenAI) and Large Language Models (LLMs) have taken the world by storm. Applications range from contract management, building chatbots, summarizing existing company expertise (‘if a company knew what a company knows’) to generating marketing content or documents describing a highly structured production process. One important contribution of LLM-based GenAI chatbots is to drastically lower the hurdles of creating amazing content (text, images, audio, video, code etc.) through natural language input, opening new possibilities of human-machine interactions for the broad public through an easy user interface. 

Limitations of current LLM applications 

But let’s face it: Most people who have interacted with ChatGPT have soon noticed that the default LLM-workflow will only get you so far once the task reaches a certain level of complexity. Even using best practice prompt engineering techniques, you will end up with an increasingly long prompt, and chances are that the LLM will not understand or adhere to all the instruction details and information provided in the prompt will be lost. One typical way around this issue is to iteratively check the chatbot responses and improve them via prompting – but this is a tedious process and runs the risk of the LLM getting stuck on or misled by its previous wrong responses in the chat context window. Another issue is that real-world tasks often require the chatbot to integrate tools such as internet search, search for relevant internal company documents (via Retrieval Augmented Generation, RAG), maths capabilities, coding expertise, or guardrails to ensure safety standards for sensitive data. 


By loading the video you accept YouTube's privacy policy.
Learn more

load Video

Video: Is the future "agentic"? An overview of multi-agent LLM systems

This is where agentic workflows and multi-agent systems (MAS) enter the stage. MAS prove extremely useful in solving complex tasks while still offering a simple, intuitive natural language interface. Imagine if, instead of a single LLM chatbot application, we built a fully customisable system of LLM agents (or ‘bots’) that are all specialised for different tasks: a reflector, a document checker, a web searcher, a critic, a coder, a chart generator, or a product owner bot criticizing your work. Take the example of GenAI-assisted coding: Here, already a two-agent producer-reflector architecture has been shown to drastically improve code output accuracy compared to a classic chatbot workflow. In this architecture, a human user gives the initial task to a user proxy bot that relays the task to a coding bot. Instead of requiring the human user to receive the chatbot code output, test it, and give feedback to the bot to iteratively get rid of likely code errors, the user proxy bot takes over these tasks and eventually hands over the final automated high-quality code product to the human user in much less time.1 

Prompt Optimisation with Reinforcement Learning

Find out how reinforcement learning enables precise and efficient optimisation of prompts for large language models through automated, trial-and-error-based approaches in our Deep Dive:

Prompt Optimization with Reinforcement Learning in Large Language Model

A new era of agentic LLMs 

But what exactly is an agent in a multi-agent system? In Artificial Intelligence: A Modern Approach, an "agent" is defined as "anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators."2 But here, we are more specifically talking about autonomous GenAI agents, which can be defined as LLM-based agents that act autonomously to achieve goals based on their available inputs and actions. What makes LLM-based agents so attractive is that they are able to process natural language inputs rather than requiring software development expertise, which makes it so much easier for humans to interact with them. 

There are several key design patterns that make agentic workflows very attractive for software development, which we would like to expand on in the following. One such concept that we already alluded to above is tool use . It is true that LLMs produce astonishing results based on the principle of next-word prediction, vast amounts of training data and billions of model parameters. Nevertheless, this type of ‘statistical reasoning’ process is known to have weaknesses when it comes to factfulness (hallucinations), logical reasoning and maths. Through tool use, your LLM agent app could solve such problems by actually using tools that do maths, or coding, or web searching, or image generation, or by using any other tool that you wish to connect to the agents. If one agent has multiple tools available, it can even decide itself how to solve the user task, e.g. by doing a function call to a web search tool, retrieving it’s own knowledge base, calling a specialised excel agent etc. In other words, the principle of next-word-prediction, upon which LLMs are based, does not have to be applied to solve any given task, but LLM agents can decide themselves which tool is most suitable to solve a given problem or sub-problem.  

Another design pattern of agentic workflows and MAS that is also well known in software engineering is the concept of modularisation. Modularisation means that different parts of a problem (and its solution) can be divided into sub-units that interact with each other (see Figure 1). Think of specialists who solve specific aspects of a problem. In this sense, modularisation is closely related to tool usage. Modularisation is generally good software practice ('divide and conquer') as it also serves to divide the solution of a problem into small sub-solutions to make the overall system more robust. Another major advantage of modular agent systems is that they can be easily integrated into existing software pipelines. Instead of implementing new functionalities (e.g. to check a vector database for similar documents with conflicting information) in the software source code itself, you can connect the agent to the software system and let it execute the new functionality. This makes software development much more flexible and reduces the development costs of new functions. 

Besides tool use and modularisation, flow engineering refers to the process of optimizing the flow of operations between agents. For instance, one such optimization might be to parallelize processes, such as searching different document databases or producing different parts of code, or separate tests, resulting in huge time-savings compared to sequential single-LLM apps. Another type of flow is to introduce feedback loops, where reflector agents can automatically criticise the output of agents for several iterations, until the output meets certain quality standards. Appropriate flow engineering of the interactions between different agents in a MAS can have tremendous benefits for their ability to reflect, plan, refine and learn collaboratively. As the agents pass around their outputs between each other, they can each contribute their expertise and get feedback from other agents to collaboratively improve the final product. 

MLOps in practice

MLOps platforms help to increase team collaboration, meet regulatory and compliance requirements and reduce time-to-market. Learn more in our Deep Dive on the topic:

MLOps Platform - Building, Scaling and Operationalising

Promise and Pitfalls of Multi-Agent Systems 

Given all of the above, the list of potential use cases is sheer boundless, and the limit is likely converging to the limit of our imagination. MAS will be capable of optimizing complex systems such as supply chains, financial markets, and healthcare coordination by dynamically responding to changes and making decentralized decisions. In smart grids, for instance, MAS can predict energy demand and manage resources more efficiently than before. Moreover, in applications like urban traffic management and environmental monitoring, MAS offer real-time data collection and response, enabling a higher level of precision and adaptability. We can only scratch the surface of a topic that truly deserves an article of its own. 

Of course, the development and usage of MAS also comes with its own set of challenges and risks. Of note, the increasing complexity of MAS typically leads to higher response latencies and API costs, which may or may not be a deal breaker for many applications. However, there are emerging developments that promise to remedy these effects. These include smaller, specialized and faster models, cheaper API cost per token, or new hardware such as so-called language processing units (LPUs) by companies like Groq, which promise astounding increases in inference speed.3 Time will tell what other hardware improvements lie in store for us in this quickly developing field.3 The future will show what further hardware improvements await us in this rapidly developing field. 

As exciting as all these developments sound, we understand there are huge potential risks from launching increasingly intelligent autonomous agents without appropriate mechanisms for aligning AI with human goals (also known as the control problem)4. Additionally, GenAI will likely have a major impact on our working world and come with big challenges of ensuring that large portions of humanity do not suffer significant disadvantages from these developments. Indeed, these topics are so big that they deserve space in separate blog articles coming soon. 

Multi-Agent powered LLMs: An outlook

In conclusion, the development of MAS provides exciting new avenues of LLMs. LLMs are generally getting better at understanding our prompts, which will improve and facilitate prompt engineering . Agentic workflows will facilitate this process even more, making human-machine interaction much more efficient and user-friendly as it has been done in the past by moving from code consoles to google search to ChatGPT and now (as we believe) to agentic workflows.

he development of MAS is only at the very beginning, and so is our work and output on agentic systems. Stay tuned for blogs and webinars and other content on comparing different popular agent frameworks (AutoGen, metaGPT, CrewAI, langchain/langgraph), multi-agent reinforcement learning, a discussion of ethical concerns and societal impact, business use cases, flow engineering workflows and many more. We believe that, in the near future, ‘intelligent’ AI systems will inevitably be agentic – and businesses and societies are only at the very beginning of understanding the opportunities – and challenges – of MAS. 

Comparison of classic workflow and agent-based workflow
Figure 1. Example comparison of non-agentic and agentic workflows. Figure adapted from Andrew Ng.

PS: Our speculative prediction: By the end of 2024, there will be platforms that rent out optimized specialist agents that can effectively be copied as many times as needed to create completely new types of work forces that can be spawned instantaneously, are available 24/7, operate at a fraction of human labour costs and with higher efficiency, and can be shut down once their task is done. Managing the socio-economic aspects of this era of ‘new new work’ will present a formidable challenge for humanity. 


1 Thus, GPT-4 coding accurcay improved from 67% in the classic zero-shot prompting case, to 95% when using an agentic workflow, and the agentic workflow also allowed GPT3.5 to outperform a zero-shot prompted GPT-4 (Andrew Ng). 

2 Russell & Norvig 2003, pages 4-5, 32, 35, 36 and 56. 

3 With up to 18x faster LLM inference performance (of 185 tokens/second on average) on public LLM performance benchmarks compared to top cloud-based providers for Meta AI's Llama 2 70B model (see here). 

4 Russell, Stuart(October 8, 2019).Human Compatible: Artificial Intelligence and the Problem of Control. United States: Viking.ISBN978-0-525-55861-3.OCLC1083694322


Dr Yannik Bauer

Dr Philipp Schwartenbeck

Philipp is a Principal Data Scientist and joined [at] in January 2023. Among other things, he works on large language models and reinforcement learning, which sparked his interest during his previous job as a computational neuroscientist. When he is not analysing data or thinking about AI algorithms, he is interested in various topics ranging from Bayesian inference to competing in sheepshead tournaments.

Dr Johannes Nagele

Dr Johannes Nagele is Senior Principle Data Scientist at Alexander Thamm GmbH. As a scientist in the field of physics and computational neuroscience, he gained 10 years of experience in statistics, data analysis and artificial intelligence with a focus on time series analysis and unsupervised learning. Dr Johannes Nagele is the author of several scientific publications and conference posters. He has been supporting Alexander Thamm GmbH in the field of data science since the beginning of 2020.

0 Kommentare