A special ability of humans is to create things that were not there before, to think "out-of-the-box" and to be creative - until now. Because thanks to new technology, the creative power is no longer reserved for humans alone. We are talking about generative AI. Large artificial neural networks are now capable of creating, processing and transforming unprecedented content. How exactly does this actually work, what can it mean for us and what is already possible today? We want to go into these questions in more detail in this article.
What is generative AI?
Until recently Artificial Intelligence (AI) and Machine Learning (ML) largely limited to predictive models that can be used to Classification of patterns were used. Simply put, until now an AI model could only distinguish whether there was a dog or a cat in a picture, for example. Generative AI now turns the tables: As the term "generate" (lat. generare -> generate), a generative AI model is able to generate a dog image based on the image description of a dog. The trick is that the depicted dog does not exist. Content created through generative AI is merely modelled on existing content, but is in itself unique. Another highlight of generative AI is its ability to interpret concepts and contexts. For example, if the dog is supposed to lie under a table, the AI can correctly interpret that the dog lies under the tabletop - not under one of the table legs. In addition, the model knows that a dog is usually smaller than a table and can therefore also be represented proportionally correctly.
To us humans, this seems completely natural at first, because we are good at recognising and interpreting connections between individual words in sentences or objects in pictures. A computer, however, does not have this ability and interprets all processes on the basis of previously established rules. Only through the high availability of Data and immense computing power, it has become possible to "teach" a computer even such relationships that we take for granted.
In the last two years in particular, the topic of "generative AI" has made a major technological leap that is visible to the outside world. But why now of all times? In a nutshell: More data, better models, higher computing power. However, this does not only apply to "generative AI", but to AI in general.
More than five years ago, research was already being conducted in the field of "generative AI". At that time, however, smaller models were state-of-the-art. These were sufficient for individual use cases such as fraud detection or delivery time prediction, but they were not meaningful enough for tasks such as generative AI.
From around 2015 onwards, the Race for great AI models(Foundation models). One of the triggers for this was the well-known Paper "Attention is All You Need. In it, researchers from the Google Research Team present a new neural network architecture: The Transformer architecture. Language models were rapidly developed on the basis of this architecture, which achieved a significantly shorter training time with better performance compared to conventional architectures. With more and more parameters, the models also became more and more complex.
The Transformer Model GPT-3 (Generative Pretrained Transformer 3) of the private research company OpenAI finally made generative AI accessible to the masses for the first time. This started the development of many applications based on generative models such as code completion, image upscaler, AI-based search, Chatbots, Image generators and many more.
Why do we need generative AI?
The impact of generative AI models is already noticeable: New tools and programmesThe number of new technologies that make use of models such as GPT-3, Stable Diffusion and others is growing rapidly. In the creative sector, they open up new possibilities and unimagined iteration speeds in the creation of illustrations, images, blog articles, marketing texts and much more. Microsoft shows with ChatGPThow a search engine can also answer complex queries and searches interactively.
However, generative AI models are not exclusively limited to creative application areas: They will also be used in research and development in the futuresuch as the model AlphaFold The model, which is based on generative AI, is able to solve a decades-old problem of protein folding. It thus opens up new research possibilities and immensely accelerates protein folding research. Going further, generative AI models already play a role in materials and drug discovery. IBM, for example, has developed an open-source Toolkit which is designed to enable researchers to discover medicines, molecules, polymers or even manufacturing materials using generative AI without the need for expert data science knowledge.
How do generative AI models work?
Generative AI models are fundamentally based on machine learning techniques such as Unsupervised and Semi-Supervised Learningto process large amounts of data. From a technical point of view, these are based primarily on two different architectures: GANs and Transformer. These form the basis with the help of which a generative model is trained and can then be used for inference.
Diffusion models are generative models that are primarily used for the creation of images are trained. They are trained with images and their descriptions (e.g. "A cat sits on a tree"). Once learned, these models can generate new data patterns that are similar to the ones they were trained on. This has led to them quickly being used for different use cases such as Image and video generation and in the Generation of synthetic data have been used. Diffusion models work by Training data deconstruct" an image by successively adding Gaussian noise and turn it into a noisy image full of dots - similar to a tube TV without reception. Subsequently, the Modelto recover the data by reversing this noise process. After training, the model can generate data by simply passing randomly sampled noise through the learned denoising process and adding an associated image description. By applying an optimisation algorithm that generates the best, or most likely, sequence of stitches, this produces entirely new data.
Transformer models use whole data sets for the transformation from input to output. Data sequences instead of individual data points. This makes them much more efficient to process in situations where the context between data points is relevant. Therefore, Transformer models (and the paper "Attention is all you need" published on this in 2017) form the Basis of large language models.
The example of language, which must be interpreted in sentences rather than word by word in order to make sense of it, reflects the Transformer architecture. With an attention mechanism, the Transformer model can, for example, assign different levels of attention to different words and thus better interpret the meaning of the sentence.
This architecture is relevant in all large-scale language models, chatbots, text-to-image transformers but also in scientific applications such as DeepMind's AlphaFold.
The landscape of generative AI tools and developments
AI is moving fast: New tools and developments based on generative AI models appear almost daily in today's AI landscape. The most important sub-areas are currently Image Generation and Natural Language Generation. Applications in the field of generative AI are based on so-called foundation models. Simply put, these are large AI models that have been trained with vast amounts of data and then further specified for concrete applications by means of fine-tuning.
The understanding, summary and Generation of speech by means of AI is based on so-called LLMs (Large Language Models). This are among the most important large AI models and represent an important advance in the field of AI. LLMs impressively demonstrate what generative AI can already do today and, above all, how we can interact with it. Texts generated by LLMs are hardly distinguishable from human-written texts, but can contain incorrect information due to the very generic training. Moreover, they have not yet reached the level of texts written by professional writers or scientific papers.
They are currently used mainly for brainstorming, first drafts, notes and marketing content. It remains to be seen to what extent the output of LLMs will continue to improve and gain in quality through more up-to-date models, fine-tuning, feedback and more application-specific training.
Code generation and completion refers to the creation of entire blocks of code or individual lines of code using AI. Because programming languages can be interpreted analogously to natural language, models for code generation are also based on LLMs. This offers the advantage of being able to specify what the function of the code should be by means of an instruction (= prompt), without having to familiarise oneself with code libraries or packages.
Whether text or code generation: ChatGPT is currently on everyone's lips. Find out what use cases could look like in your company and what integration challenges await you.
Text-to-image models are able to create images from text input. The style, angle of view, type of image and size can be modified as desired. Models such as Midjourney, StableDiffusion and others can be used to create an image in the style of Picasso that does not exist, to create breathtaking artwork or to generate photorealistic images of people.
Find out in our blog post how new AI models like Text-to-Image Transformer can create realistic images from text that look amazingly similar to human-made artwork and photos.
"Content is AI-NG - text-to-image generators at a glance" - Alexander Thamm GmbH
With Make-A-Video from Meta and Microsoft's X-Clip, models are slowly emerging that are even able to generate videos artificially. However, these are currently limited by the high computing power required for this. Because the generation of images is already computationally expensive, it requires immense computing power for videos (at least 24 images per second). However, more efficient models and the wider availability of large GPU clusters will make this bottleneck a thing of the past in the future.
Previously known as rule-based systems that are supposed to be able to answer customer questions, for example, these models have now evolved into knowledge repositories with context-based communication capabilities: With ChatGPT, OpenAI has succeeded in creating a chatbot that is able to conduct an entire conversation about a topic, accept suggestions for improvement and refer to past points in the conversation. This makes conversations with ChatGPT very intuitive.
You can find more exciting information about chatbots and where you can use them in your company in our blog:
Speech recognition has been around for some time ("Hey Siri"), but truly usable speech generation has only recently emerged. For high-end applications such as films and podcasts, the bar for unique human speech quality that does not sound mechanical is quite high. Nevertheless, there are already models like Microsoft's "VALL-E" that are able to synthesise the speech of a specific human being - using only a few speech samples. Because speech is a very distinctive feature of humans and it has been very difficult to fake it so far, applications here can unfortunately also cause considerable damage: With deepfakes, for example, the voices of well-known personalities can be simulated and content can be pronounced that would never be possible in reality.
Product design is a complicated process, often the beginning alone is difficult and potential optimisations are complicated to implement. Generative models such as DreamFusion are able to create any shape imaginable and thus accelerate and improve this iterative process. The model converts text into a 3D model - this can be useful for brainstorming, finding new possible shapes or optimising components, for example. 3D generators are based on text-to-image generators and are still in the early stages of development, but can provide promising output in the future.
In the audio, gaming and music sector, too, new models are constantly emerging that are capable of designing games, generating synthetic music and much more. So far, however, AI-generated songs sound rather unusual and weird - they (still) lack "soul".
Another important area of generative AI models is research. Generative models are also playing an increasingly important role in the discovery of new drugs. The AlphaFold model from the research company DeepMind has already proven that generative AI is capable of answering research questions. AI models are currently being developed in a wide variety of fields that can help researchers answer important scientific questions and thus could have productive benefits for all of us.
Legal framework for the use of generative AI
Generally speaking, generative models are So-called "General Purpose AI. In other words, AI that is not only developed for a certain limited purpose, but can take on many different tasks. Because these models have only been around for a few years, there has so far been no Still no EU-wide legal regulation on the use of these models achieved. However, the EU is about to change this with the AI Act: The AI Act provides for making general purpose AI (GP-AI) safer by means of various requirements for GP-AI and preventing the use of this technology for unlawful purposes. As things stand, however, the requirements placed on GP-AI systems can hardly be met.
What opportunities and risks does the controversially discussed draft regulation EU AI Act hold for users of AI systems? Get an overview of the thematic focus and challenges for companies in our blog post:
For many generative models, it remains doubtful whether they comply with the General Data Protection Regulation (GDPR) in Europe. The almost unbelievable knowledge of large models such as ChatGPT is based on public content such as books, articles, websites or even posts on social media. Therefore, a lot of data also comes from social media users themselves. In itself, this fact is not problematic, but at no point is the consent of the creators of this content obtained. Whether personal data also flows into generative models remains questionable.
Text-to-image generators in particular benefit from the large number of works that have been distributed on the internet. However, the copyrights of these works remain in a grey area. If, for example, I generate a work of art "in the style of Picasso", it remains unclear whether and to what extent this can be reconciled with the copyrights of Picasso's works. The current approach relies on the fact that the "prompt", i.e. the input text, represents the creative achievement here and can therefore be protected by copyright - but not the generated image or the training data used for the model.
If our human works such as texts, pictures, videos and sound recordings fall into the wrong hands, they can also be used for a lot of mischief: Time and again, so-called "deepfakes" circulate on the internet, imitating the speech, facial expressions, gestures and appearance of celebrities, politicians and public figures. These are now deceptively real and can only be proven to be fakes by experts. With the expected further improvement of generative models, this can have worrying consequences - in the private, political and economic sense.
We are excited about what the future of generative models will bring. One thing is already clear: the potential areas of application and use cases are almost infinite.