Why the Future is Agentic: An overview of Multi-Agent LLM Systems

Published: 24.06.2024
Author: Dr. Yannik Bauer, Dr. Johannes Nagele, Dr. Philipp Schwartenbeck
Category: Deep Dive

Since the breakthrough of ChatGPT in November 2022, Generative AI (GenAI) and Large Language Models (LLMs) have taken the world by storm. Applications range from contract management, building chatbots, summarizing existing company expertise (‘if a company knew what a company knows’) to generating marketing content or documents describing a highly structured production process. One important contribution of LLM-based GenAI chatbots is to drastically lower the hurdles of creating amazing content (text, images, audio, video, code etc.) through natural language input, opening new possibilities of human-machine interactions for the broad public through an easy user interface.

Limitations of current LLM applications

But let’s face it: Most people who have interacted with ChatGPT have soon noticed that the default LLM-workflow will only get you so far once the task reaches a certain level of complexity. Even using best practice prompt engineering techniques, you will end up with an increasingly long prompt, and chances are that the LLM will not understand or adhere to all the instruction details and information provided in the prompt will be lost. One typical way around this issue is to iteratively check the chatbot responses and improve them via prompting – but this is a tedious process and runs the risk of the LLM getting stuck on or misled by its previous wrong responses in the chat context window. Another issue is that real-world tasks often require the chatbot to integrate tools such as internet search, search for relevant internal company documents (via Retrieval Augmented Generation, RAG), maths capabilities, coding expertise, or guardrails to ensure safety standards for sensitive data.

This is where agentic workflows and multi-agent systems (MAS) enter the stage. MAS prove extremely useful in solving complex tasks while still offering a simple, intuitive natural language interface. Imagine if, instead of a single LLM chatbot application, we built a fully customisable system of LLM agents (AI agents or ‘bots’) that are all specialised for different tasks: a reflector, a document checker, a web searcher, a critic, a coder, a chart generator, or a product owner bot criticizing your work. Take the example of GenAI-assisted coding: Here, already a two-agent producer-reflector architecture has been shown to drastically improve code output accuracy compared to a classic chatbot workflow. In this architecture, a human user gives the initial task to a user proxy bot that relays the task to a coding bot. Instead of requiring the human user to receive the chatbot code output, test it, and give feedback to the bot to iteratively get rid of likely code errors, the user proxy bot takes over these tasks and eventually hands over the final automated high-quality code product to the human user in much less time.¹

Ist die Zukunft “agentisch”? Ein Überblick über Multi-LLM-Systeme | Webinar | Alexander Thamm GmbH

A New Era of Agentic LLMs

But what exactly is an agent in a multi-agent system? In Artificial Intelligence: A Modern Approach, an "agent" is defined as "anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators."² But here, we are more specifically talking about autonomous GenAI agents, which can be defined as LLM-based agents that act autonomously to achieve goals based on their available inputs and actions. What makes LLM-based agents so attractive is that they are able to process natural language inputs rather than requiring software development expertise, which makes it so much easier for humans to interact with them.

There are several key design patterns that make agentic workflows very attractive for software development, which we would like to expand on in the following. One such concept that we already alluded to above is tool use. It is true that LLMs produce astonishing results based on the principle of next-word prediction, vast amounts of training data and billions of model parameters. Nevertheless, this type of ‘statistical reasoning’ process is known to have weaknesses when it comes to factfulness (hallucinations), logical reasoning and maths. Through tool use, your LLM agent app could solve such problems by actually using tools that do maths, or coding, or web searching, or image generation, or by using any other tool that you wish to connect to the agents. If one agent has multiple tools available, it can even decide itself how to solve the user task, e.g. by doing a function call to a web search tool, retrieving its own knowledge base, calling a specialised Excel agent etc. In other words, the principle of next-word-prediction, upon which LLMs are based, does not have to be applied to solve any given task, but LLM agents can decide themselves which tool is most suitable to solve a given problem or sub-problem.

Another design pattern of agentic workflows and MAS that is also well known in software engineering is the concept of modularisation. Modularisation means that different parts of a problem (and its solution) can be split up into subunits that interact with each other (see Figure 1). Think of this as specialists that solve specific aspects of a problem. In this sense, modularisation is closely related to tool use. Modularisation is generally good software practice (cf. ‘divide and conquer’ or ‘separation of concerns’), as it also allows to isolate performance problems, here of individual agents, to make the overall system more robust. Another big advantage of modular agent systems is that they could easily be integrated into existing software pipelines – rather than needing to implement new functionality (e.g. to check a vector database for similar documents with conflicting information) into the software source code itself, one can connect the agent to the software system and let them implement the new functionality. This makes software development much more flexible and reduces development costs of new features.

Besides tool use and modularisation, flow engineering refers to the process of optimizing the flow of operations between agents. For instance, one such optimization might be to parallelize processes, such as searching different document databases or producing different parts of code, or separate tests, resulting in huge time-savings compared to sequential single-LLM apps. Another type of flow is to introduce feedback loops, where reflector agents can automatically criticise the output of agents for several iterations, until the output meets certain quality standards. Appropriate flow engineering of the interactions between different agents in a MAS can have tremendous benefits for their ability to reflect, plan, refine and learncollaboratively. As the agents pass around their outputs between each other, they can each contribute their expertise and get feedback from other agents to collaboratively improve the final product.

Promise and Pitfalls of Multi-Agent Systems

Given all of the above, the list of potential use cases is sheer boundless, and the limit is likely converging to the limit of our imagination. MAS will be capable of optimizing complex systems such as supply chains, financial markets, and healthcare coordination by dynamically responding to changes and making decentralized decisions. In smart grids, for instance, MAS can predict energy demand and manage resources more efficiently than before. Moreover, in applications like urban traffic management and environmental monitoring, MAS offer real-time data collection and response, enabling a higher level of precision and adaptability. We can only scratch the surface of a topic that truly deserves an article of its own.

Of course, the development and usage of MAS also comes with its own set of challenges and risks. Of note, the increasing complexity of MAS typically leads to higher response latencies and API costs, which may or may not be a dealbreaker for many applications. However, there are emerging developments that promise to remedy these effects. These include smaller, specialized and faster models, cheaper API cost per token, or new hardware such as so-called language processing units (LPUs) by companies like Groq, which promise astounding increases in inference speed.³ Time will tell what other hardware improvements lie in store for us in this quickly developing field.

As exciting as all these developments sound, we understand there are huge potential risks from launching increasingly intelligent autonomous agents without appropriate mechanisms for aligning AI with human goals (also known as the control problem)⁴. Additionally, GenAI will likely have a major impact on our working world and come with big challenges of ensuring that large portions of humanity do not suffer significant disadvantages from these developments. Indeed, these topics are so big that they deserve space in separate blog articles coming soon.

Summary

In conclusion, the development of MAS provides exciting new avenues of LLMs. LLMs are generally getting better at understanding our prompts, which will improve and facilitate prompt engineering. Agentic AI workflows will facilitate this process even more, making human-machine interaction much more efficient and user-friendly as it has been done in the past by moving from code consoles to google search to ChatGPT and now (as we believe) to agentic workflows. The development of MAS is only at the very beginning, and so is our work and output on agentic systems. Stay tuned for blogs and webinars and other content on comparing different popular agent frameworks (AutoGen, metaGPT, CrewAI, langchain/langgraph), multi-agent reinforcement learning, a discussion of ethical concerns and societal impact, business use cases, flow engineering workflows and many more. We believe that, in the near future, ‘intelligent’ AI systems will inevitably be agentic – and businesses and societies are only at the very beginning of understanding the opportunities – and challenges – of MAS.

PS: Our speculative prediction: By the end of 2024, there will be platforms that rent out optimized specialist agents that can effectively be copied as many times as needed to create completely new types of work forces that can be spawned instantaneously, are available 24/7, operate at a fraction of human labour costs and with higher efficiency, and can be shut down once their task is done. Managing the socio-economic aspects of this era of ‘new new work’ will present a formidable challenge for humanity.

References

¹Thus, GPT-4 coding accurcay improved from 67% in the classic zero-shot prompting case, to 95% when using an agentic workflow, and the agentic workflow also allowed GPT3.5 to outperform a zero-shot prompted GPT-4 (Andrew Ng).

²Russell & Norvig 2003, Seiten 4–5, 32, 35, 36 and 56.

³With up to 18x faster LLM inference performance (of 185 tokens/second on average) on public LLM performance benchmarks compared to top cloud-based providers for Meta AI’s Llama 2 70B model (see here).

⁴Russell, Stuart (October 8, 2019). Human Compatible: Artificial Intelligence and the Problem of Control. United States: Viking. ISBN 978-0-525-55861-3. OCLC 1083694322.

Share this post:

Authors

Dr. Yannik Bauer

Dr. Philipp Schwartenbeck

Philipp is a Principal Data Scientist and joined [at] in January 2023. His research interests include Large Language Models and Reinforcement Learning, topics that sparked his interest during his previous work as a Computational Neuroscientist. When he's not analyzing data or thinking about AI algorithms, he is interested in a variety of topics, ranging from Bayesian inference to competitive Schafkopf tournaments.

Dr. Johannes Nagele

Dr. Johannes Nagele is Senior Principle Data Scientist at Alexander Thamm GmbH. As a scientist in the field of physics and computational neuroscience, he gained 10 years of experience in statistics, data analysis and artificial intelligence with a focus on time series analysis and unsupervised learning. Dr. Johannes Nagele is the author of several scientific publications and conference posters. He has been supporting Alexander Thamm GmbH in the field of data science since the beginning of 2020.

Provider:	HubSpot European Headquarters 1 Sir John Rogerson's Quay Dublin 2, Ireland
Cookiename:	__hstc; hubspotutk; __hssc; __hssrc; __cf_bm; __cfruid
Runtime:	6 months; 6 months; 30 minutes; session end; 30 minutes; session end
Privacy source url:	https://legal.hubspot.com/privacy-policy
Host:	.hubspot.com

Provider:	InnoCraft Ltd., 150 Willis St, 6011 Wellington, New Zealand
Cookiename:	_pk_id..; _pk_ses..
Runtime:	13 months; 30 minutes
Privacy source url:	https://matomo.org/gdpr-analytics/
Host:	.matomo.cloud

Provider:	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Cookiename:	YSC; VISITOR_INFO1_LIVE; PREF
Runtime:	Session end; 6 months; 8 months
Privacy source url:	https://policies.google.com/privacy
Host:	.youtube.com

Provider:	Podigee GmbH, Revaler Straße 28, 10245 Berlin, Germany
Cookiename:	Not specified
Runtime:	Not specified
Privacy source url:	https://www.podigee.com/en/about-us/privacy/
Host:	.podigee.com

Provider:	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Cookiename:	SID; HSID; NID
Runtime:	2 years; 2 years; 6 months
Privacy source url:	https://policies.google.com/privacy
Host:	.google.com