The success of generative AI in business applications has led to its underlying technology, foundation models, being used for decision-making in challenging scenarios. Some critical applications arising from the intersection of foundation models and decision-making processes include foundation models interacting with external agents and performing logical tasks. Given the advances in this area Yang et al. (2023) an article which describes the current and future capabilities of foundation models for decision making and the challenges for their implementation. This blog focuses on connecting the research findings from the paper with business applications to provide readers with a holistic overview of the state of research and its further development.
Inhaltsverzeichnis
What are foundation models (basic models)?
Foundation models (also known as base models) are created using large quantities of Training data trained. These are generalised models that not trained for specific use cases but for a larger number of tasks, resulting in better performance on each task due to Transfer learning leads.
Foundation models are already transforming the industry by demonstrating human-like dexterity. Popular foundation models include the large voice model ChatGPT and the image generation model DALL-E. Until recently, however, it was difficult to imagine foundation models for decision-making tasks capable of drawing conclusions, such as interactive interaction with dialogue systems and autonomous navigation on factory floors, due to the lack of combined real and synthetic data. However, this has changed with the advent of multimodal data, which includes images, text, trajectories and depth maps. Multimodal datasets, which consist of high fidelity real-world data with unknown factors and synthetic data that provides infinite variations of known factors, have enabled ground-breaking innovations in decision-making for organisations.
Find out all about foundation models and how they can be used effectively in companies to give you a competitive edge and accelerate business processes in our basic article.
Foundation models are changing decision-making in organisations
Foundation models are changing the way companies make decisions. Covariant, for example, has launched the RFM-1 model, which offers ground-breaking applications for improving warehouse processes. The model enables robots to give answers to instructions, answer questions about what they see and request longer instructions. This has huge implications for organisations as it enables novel human-robot interactions and paves the way for profitable collaboration. In this section, we look at some of the ways in which this kind of progress is being made.
Generative modelling
Generative modelling is essentially based on the idea that intelligence and generalisation arise from an understanding of the world based on a large amount of data. Two concepts from this area are explained below:
- Generalist agents that are trained using extensive behavioural data setsEven if different tasks consist of a variety of observations and rewards, they often have similar meaningful behaviours. For example, "go to the right" has a similar meaning in robotics, games and navigation. Therefore, this concept includes foundation models that have been trained on extensive data sets of real or simulated behaviours, which helps the models learn how agents act in different situations. This results in generalist AI agents that understand the dynamics of the world and can adapt to new situations.
- Generative models of exploration and self-improvementGenerative behaviour models can model meta-level processes such as exploration and self-improvement. This is possible when the interactive data set, DRL, embodies meta-level behaviours, such as the repetition buffer of an agent trained from scratch with policy gradients. The algorithmic distillation mimics the action sequence of a multi-episodic improvement process of DRL by using a transformer-based sequence model inspired by the zero-shot capability of language models. It adapts to downstream tasks purely contextually, without updating any network parameters. Algorithmic distillation prompts an agent with its previous learning experience. Corrective re-prompting incorporates feedback from the environment as additional input to improve the executability of an inferred plan.
Foundation models as learning representations
Basic models for decision making use representation learning to summarise knowledge. This is possible in two ways. First, foundation models can extract representations from large-scale image and text data (D), leading to a plug-and-play style of knowledge transfer to visual and language-based decision tasks. Second, foundation models can support the learning of task-specific representations through task-specific goals and interactive data (DRL).
- Plug-and-play foundation modelsThese foundation models are more natural when the decision task involves images or text from the real world. Foundation models trained with text and image data on the Internet can be used as preprocessors or initialisers for different perceptual components of decision agents. For example, if the agents' observations consist of images and text descriptions, models for labelling images with language can enrich the agents' observations with language descriptions.
- Computer vision and natural language processing as task specifiers: In this approach, desired behaviours of the agents are supported by additional Data such as text descriptions and target images of a specific task. This allows the Foundation model to learn more robust, general and cross-task strategies. A text description such as "close the car door" or a target image showing the closed car door can be an input that complements the current robot state.
Large language models as agents and environments
When we large language models As a result, we enable learning from feedback from the environment, whether it comes from people, tools or the real world.
- -nteraction with peopleThe application of Foundation models to dialogue generation is seamless, as both D and DRL belong to the text modality. This enables task-specific fine-tuning. This approach has proven to be successful in human post-evaluation based on key criteria such as safety, truthfulness and helpfulness. Human feedback was initially used to evaluate dialogue systems, but was eventually used as a reward signal for dialogue agents as part of the Reinforcement Learning for Human Feedback (RLHF) integrated.
- Interaction with toolsLanguage models receive additional input in response from tools such as computers, search engines and translators. Language model agents generate API calls to invoke external tools and receive responses as feedback to support further interaction. The language model agents can then be formulated as a sequential decision problem.
Large language models are transforming interaction with technology and expanding its application from content creation to customer service. Our overview presents 14 relevant representatives in detail:
Challenges and potentials of using foundation models
The challenges and potentials of data integration, environment structuring and improved decision-making through the use of foundation models are outlined below:
Data integration
Challenge: Multimodalities and structures
ProblemA key challenge that organisations can face when implementing base models to support strategic decision making is the integration of image and language datasets (D) and task-specific interactive datasets (DRL).
Solution:
- Human feedback: This data gap between D and DRL can be closed using various techniques. For example, D can be made more task-specific through human feedback by relabelling actions and rewards in video and text data after the fact.
- Consolidation of data: In addition, DRL can be expanded by merging task-specific sets of interactive data.
Structuring environments for different applications
Challengedifferent state action spaces
ProblemFoundation models in the areas of vision and speech are broadly diversified and can solve different tasks. They can even be generalised to new tasks by fine-tuning them with few or no recordings. In this way, vision and language datasets serve as a universal task interface. However, there is an enormous environmental diversity in decision making, where different environments function in different action spaces. This prevents knowledge sharing and generalisation.
Solution:
- Universal coding: One way to solve the problem of different states and action spaces is to encode all states, actions and rewards in different environments and tasks into universal tokens in a sequential modelling framework. However, universal tokenisation cannot preserve the rich knowledge and generalisation capabilities of pre-trained computer vision and language models.
- Text as environmentAnother technique is to convert the environment with different state action spaces into text descriptions and use the text as a universal interface for learning generalisation strategies. However, this would require the collection of additional data. Another problem is incongruent state action spaces in different tasks.
- Video as policy and "world model": Image frames can be used as universal interfaces to represent state-action spaces, and videos can be used to represent policies. In this way, policy learning can rely on already trained web-scale text-to-video models. However, this approach would require further training.
Improving the decision-making process
Challenge: Eliciting desirable behaviour
ProblemFoundation models for decision making require that the task-independent models can adapt to task-specific agents.
Solution:
- Fine-tune instructions or prompts in just a few momentsWith this method, an extensive pre-trained language model can be specialised so that it outputs the desired sentences.
- Large language models as interactive agentsThis process enables massive online access to highly scalable and available environments such as search engines, databases and compilers.
- Infrastructure that enables software tools as environmentsRemote procedure calls as interactions and foundation models as guidelines promise effective applications in the real world.
In 2024, breakthroughs in artificial intelligence such as Quantum Machine Learning and Neurosymbolic AI will shape the technological landscape in the DACH region and offer companies new opportunities and challenges.
How do companies integrate foundation models into their decision-making processes?
Foundation models provide a powerful approach to decision making, but successful integration requires careful consideration of tasks, models and data. Covariant's approach of combining the largest real-world robot production dataset with an extensive collection of internet data enables new levels of accuracy and productivity in warehouse applications and shows a clear path to expanding to other robot form factors and broader industry applications. In light of recent developments in this area, here is a brief guide for companies looking to integrate Foundation models into their operations:
1. understanding the decision-making task
- Identification of the problemClearly define the business decision that you want to support with the foundation model. For example, you may want to integrate foundation models for interactive product recommendations.
- Data availability: Determine the type and quality of data the foundation model needs for training. For example, if you want to use a foundation model to present your company's recently launched wool to someone preparing for the cold weather, you will need large amounts of general text data.
- Performance metricsDefine how you want to measure the success of the foundation model's contribution to decision-making. For example, you can define high accuracy and cost reduction as key performance indicators for the foundation model for product recommendations.
2. selection of the appropriate foundation model
- Capabilities of the Foundation modelEvaluate the capabilities of the foundation model by using standard foundation model benchmarks to assess its performance on the decision tasks for which you want to use a foundation model. You can also present the foundation model with real-world situations to analyse its responses to logical reasoning.
- Training dataUse curated data sets that are customised to your specific business needs.
- Compare different foundation modelsEvaluate several Foundation models before deciding on one. Prioritise the model's performance in your benchmarks and its reasoning capabilities for your use case.
3. iterative improvement with real data
- Keep people in the processLet the person interact with the Foundation Model's answers and give them feedback on their reasoning skills.
- Evaluation and monitoringUse your chosen metrics to continually assess the performance of the Foundation model and identify areas for improvement.
- Assessment of explainabilityInvestigate AI techniques that your organisation can use to assess the reasoning capabilities of the base model. This can help identify potential weaknesses in its reasoning process and lay the groundwork for future improvements.
Transformative potential of foundation models
The transformative potential of foundation models in business operations cannot be overstated. As foundational models for decision making continue to advance, leaders need to keep a close eye on the latest research and developments. While the use of robotic agents for logical tasks is still in its infancy, foundation models are making steady progress and significant breakthroughs are on the horizon. Therefore, organisations need to be proactive in selecting the most appropriate foundation models for their decision-making needs.
0 Kommentare