What is Google PaLM (Pathways Language Model)?

The Pathways Language Model (abbreviated to PaLM) from Google is a powerful language modeldeveloped for understanding and generating speech. PaLM is a dense decoder-only transformer model trained with the Pathways system. It is a 540 Billion Parameter Modeltrained on multiple TPU v4 pods, making it extremely efficient.

PaLM was trained with a combination of English and multilingual datasets, including web documents, books, Wikipedia, conversations and GitHub code. The vocabulary was also adapted to preserve all spaces, split Unicode characters not included in the vocabulary into bytes, and split numbers into individual tokens, allowing for effective training.

Google PaLM is an important milestone on the way to realising Google Research's vision for Pathways: a single Modelthat can be generalised across domains and tasks and is highly efficient at the same time.

Functions and capabilities

PaLM achieved impressive breakthroughs on a variety of language, reasoning and code tasks. In the evaluation of 29 English-language natural language processing tasks (Natural Language Processing, NLP for short), PaLM outperformed many previous models in 28 of the 29 tasks. In addition, it showed strong performance on multilingual NLP benchmarks, including translation, even though only 22% of the training corpus is non-English.

In addition, Google PaLM showed impressive performance in several BIG Bench tasks. Natural language comprehension and production skills. For example, the model was able to distinguish cause and effect, understand conceptual combinations in appropriate contexts and even guess the film from an emoji.

PaLM also has several Breakthrough skills in code tasks. It can generate high-quality code (text-to-code) that can be executed directly, it can understand natural language explanations of code, and it can provide code completion and error correction (code-to-code). PaLM has shown that it is also capable of generating code for tasks such as sorting, searching and web scraping. PaLM can solve all these tasks, although only 5 % of code are included in its pre-training dataset.

Of particular note is the ability to perform well in few-shot scenarios, which is comparable to the fine-tuned Codex 12B model, although with 50 times less Python code was trained. This result supports earlier findings that larger models can be more effective when it comes to, Transfer learning from both programming languages and natural language data, improving their sampling efficiency compared to smaller models.

PaLM's training efficiency is impressive, with a hardware FLOPs utilisation of 57.8 %, the highest yet achieved for LLMs of this size. This is due to a combination of the parallelism strategy and a reformulation of the Transformer blocks due to the parallel computation of attention and Feedforward layers is made possible. This enables speed increases through TPU compiler optimisations.