GPT-3 - the next level of AI

from JÖRG BIENERT | 17 March 2021 | Basics

Since the release of the beta of GPT-3 in June 2020, there have been numerous reports about the new language model. But how exactly does GPT-3 work, what advantages does it have in practical use and what distinguishes the language model from previous developments? In this article you will learn how GPT-3 works and where the advantages lie.

Inhaltsverzeichnis

What exactly is GPT-3?

Like its predecessors, which have already been in existence for several years, GPT-3 makes a prediction for the next wordsthat are highly likely to follow. This makes it possible to write entire texts and precise content with added value, without human interaction. The robot's content is often indistinguishable from that which comes from a human pen. You can read a sample text on theguardian.com view.

GPT-3 was developed by the organisation Open AI, which in the year Founded in 2015 by entrepreneur Elon Musk and was initially set up as a non-profit company. Together with universities and institutions worldwide, the team conducts research in the area of Artificial intelligence and makes the results of research available for public application. At this point, the organisation Open AI has set itself the goal of creating a general and human-like artificial intelligence in the long term. The GPT-3 language model is not the only project of the organisation.

Projects such as Open AI Gym, which is a standardised method for comparing multiple algorithms, are also part of the organisation's research focus. The same applies to the music generator Jukebox. This neural network is able to generate music from numerous genres and styles and in this way create its own music. But how exactly do all these developments relate to GPT-3?

The evolution towards the GPT-3 model

GPT-1 launched the first model in a series of projects on the application of AI for Natural Language Processing. Prior to this, the aim of this development was to create a supervised environment for learning in order to allow moods within texts to become recognisable. This was used through certain signals in the text, which in turn depended on specific data inputs. But the goals evolved further.

In 2018, the team's leading researchers developed a new model that works independently of specific tasks. The training was based on non-specific texts, which were then trained individually for each specific task. The result of the continuous developments was subsequently GPT-1, which was intended to improve general language comprehension through concrete training. All without a multitude of elaborate sample tasks.

Since June 2020, the third version, GPT-3, has been available. The difference, however, is that unlike the previous two models, the new version was not made available free of charge for further research. Open AI changed the business model and ensured that access is now subject to a fee and is currently limited to only a few users. Officially, it is therefore still a beta, but the functions are already more far-reaching than in any previous version.

Even compared to other NLP applications, GPT-3 convinces with an enormous variety of functions and new solutions. In contrast to BERT, T5 or the direct predecessor GPT-2, the model has gained significantly in size. It is trained on texts with a compressed size of up to 570 GB. The German high-performance computer SuperMUC-NG would still need more than 100 days of computing time to train the model.

How does GPT-3 work?

GPT-3 is a language model. In concrete terms, it is a statistical tool that can be used to predict words in concrete terms. However, the difficulties of such a solution lie in the different levels of language. Every language is based on several levels of meaning, linguistic variance, grammatical constructions and stylistic devices that authors distribute individually.

The difficulty of using certain vocabulary is also a serious problem with many language models available on the market so far. Basically, each word must be converted into a certain sequence of numbers. The computer only knows numbers, which is why a translation for the system itself must practically be available in advance. A lot of memory is required to create this link, which limits the use of the systems.

Nevertheless, modern language models do have potential. Especially for large corporations like Google, the automatic completion of content leads to a noticeable simplification in order to keep an eye on automatic processes without the need for personnel. The same is true in coding. Language models are at least theoretically capable of automatically completing and improving code. With continuous expansion of existing functions, the potential increases significantly.

GPT-3 as the basis of modern transfer learning

Another example of the use of language models such as GPT-3 is transfer learning. This is a machine learning technique by which a model originally trained for a specific task is supplemented for a second task. No approach in Deep Learning manages to use the respective models for further tasks faster. To develop general model approaches with or without prior training, aspects such as transfer learning are an excellent choice.

However, impressive new applications are already emerging that can be of importance not only to large tech corporations and businesses. GPT-3 makes it possible to create completely new sections of text based on individual paragraphs, no matter what the topic. Structures, language style and content are precisely trained to reflect the topic in detail.

Especially in the context of NLP, the enormous progress that has been made with GPT-3 compared to other solutions is already evident. In fact, the language model is even able to solve the problems of old NLP systems and models. One of the decisive advantages at this point is the time saved. It makes it possible to train the language model much more efficiently and to prepare more intensively for new tasks. Within practical use, this ensures significantly more freedom from errors as well as smooth application.

The applications of GPT-3

In order to learn more about the use of GPT-3, it is useful to take a look at practical scenarios and application purposes. In recent months, numerous demos have been created for this purpose, with which the functions can already be comprehensively recognised. With the right API, the latest approaches to using the innovative language model can already be seen today, albeit unfortunately behind closed doors. The following six areas are particularly impressive and show what GPT-3 can already do in the current phase:

1. code

Innovative generators for layouts and code completion have become possible with GPT-3. This makes it possible, with the appropriate pre-training, to generate completely new code that adapts to the desired layout. By describing the layout in its own words, the language model is able to generate the code.

2. emails

No model to date has managed to keep the speed of recreating e-mails so short. In terms of content, GPT-3 does not allow for any errors, but rather bases the composition of a reply exactly on the existing templates and the text of the received e-mail. Even the personal writing style is not lost, but is adapted exactly in the e-mail.

3. tables

Even for tables with Excel, users are able to create a complex yet completely correct sequence using just a few examples. The logical connection of examples such as cities and their population figures becomes recognisable to the language model at all times. Thus, the model independently searches for the values for other regions and adds them to the table.

What comes next? An outlook...

The numerous functions already show what influence GPT-3 and possible further versions may have on the market over time. Language and text are important in all industries and fields, which is why the public in particular will have a high interest in the development of new solutions and functions. However, since GPT-3 is no longer freely available compared to the previous two versions, not only positive consequences are on the horizon.

Since GPT-3 is exclusively licensed by Microsoft, similar dangers loom as already with Google search. The search engine practically dominates the market with more than 90 % of all search queries, without comparable providers having even a hint of a chance. The exclusion of the public from the further development of GPT-3 creates the danger of further monopolies or oligopolies, since the competition lacks a similar or identical technical basis.

For interested companies, only the previous versions are available so far, which can still be used, expanded and analysed publicly. At the same time, however, Microsoft is working at a rapid pace with leading teams to expand the capacities around GPT-3 and to develop new trends. For small, committed companies, it is probably already too late at this point.

Author

JÖRG BIENERT

Jörg Bienert is partner and CPO of Alexander Thamm GmbH, Germany's leading company for data science and AI. At the same time, he is co-founder and chairman of the KI-Bundesverband e.V. and a member of the Advisory Board Young Digital Economy at the BMWI. Furthermore, he is a respected keynote speaker and is regularly featured in the press as a data & AI expert. After studying technical computer science and holding several positions in the IT industry, he founded ParStream, a Big Data start-up based in Silicon Valley that was acquired by Cisco in 2015.