Wu Dao 2.0 – This is China’s new transformer

by | 2. August 2021 | Tech Deep Dive

There is a constantly increasing competitive pressure when it comes to developing innovative AI models. A year after OpenAI landed a huge developmental leap with the GTP-3 model and put the world in an uproar, researchers at the Beijing Academy of Artificial Intelligence (BAAI) unveiled Wu Dao 2.0 in early June 2021: 10 times larger than GPT-3 and therefore now the world’s largest neural network model. 

A fascinating announcement from a tech perspective. A warning for European and US politics and their industries not to fall completely behind. In other words: a signal of China’s ambition to become the world leader in AI development.

Wu Dao 2.0 outperforms GPT-3 and Google Switch Transformer 

Just in March 2021, the BAAI released Wu Dao 2.0’s predecessor, Wu Dao 1.0. Only a month later, the research group and their industry partners such as Xiaomi, Meituan, and Kuaishou released the updated version of the multimodal model

Wu Dao 2.0, which literally translates to “understanding the laws of nature”, has 1.75 trillion parameters. This surpasses GPT-3 by a factor of ten, breaking the size record previously set in May by Google’s Switch Transformer AI language model (1.6 trillion parameters) by 150 billion parameters. 

Furthermore, Wu Doa 2.0 continued last year’s movement towards multimodal AI systems. That means it learns from image and text data and can flexibly process complex tasks based on both types of data. Specifically, it masters capabilities such as natural language processing, text generation, image recognition and image generation, and can even predict 3D structures of proteins, similar to DeepMind’s AlphaFold

Special features of Wu Dao 2.0: Size and robustness 

Wu Dao was trained with the help of 4.9 TB of text and image data. This makes the GPT-3 training set (570 GB of clean data from 45 TB of curated data) look frighteningly small in comparison. The data is composed of 1.2 TB of Chinese text data, 2.5 TB of Chinese graphics data and 1.2 TB of English text data. 

Examples of comparable multimodal approaches are OpenAI’s DALL-E and CLIP or Google’s LaMDA and MUM. But the Chinese model is much more complex in terms of scale as well as achieving robustness that outperforms the current state-of-the-art (SOTA) in nine widely used AI benchmarks, according to the BAAI researchers

  • ImageNet (zero-shot): OpenAI CLIP 
  • LAMA (factual and commonsense knowledge): AutoPrompt 
  • LAMBADA (cloze tasks): Microsoft Turing NLG 
  • SuperGLUE (few-shot): OpenAI GPT-3 
  • UC Merced Land Use (zero-shot): OpenAI CLIP 
  • MS COCO (text generation diagram): OpenAI DALL-E 
  • MS COCO (English graphic retrieval): OpenAI CLIP and Google ALIGN 
  • MS COCO (multilingual graphic retrieval): before UC² (best multilingual and multimodal pre-trained model) 
  • Multi 30K (multilingual graphic retrieval): ahead of UC² 

Wu Dao 2.0. and FastMoE 

Next, if you ask the question about usability and commercialization possibilities, you will probably get FastMoE as an answer. It is an open-source architecture like Google’s Mixture of Experts (MoE) that was used for Google’s Switch Transformer. In this approach, certain information will only be sent to an expert network within the large model. Using this method, the necessary computing power is reduced because only specific sections of the model are active, dependent on the information being processed. This ensures hyper scaling, efficiency, and high precision. Additionally, FastMoE is more flexible than Google’s system because it has been trained by supercomputers as well as on conventional GPUs and therefore it does not require proprietary hardware. 

However, a scientific publication on Wu Dao 2.0 is still pending. But it seems that Wu Dao 2.0 can generate noteworthy results in the most important benchmarks across tasks and modalities. 

Application of Wu Dao 2.0. – on the way to the AI Grid 

According to Tang Jie, deputy director of the BAAI, there is one main goal that is being pursued: the development and implementation of cognitive abilities in machines (Turing tests). 

An example of this was the presentation of Hua Zhibing, a virtual student who has learned to compose music, write poetry, paint pictures and code based on Wu Dao 2.0. Compared to GPT-3, Wu Dao 2.0 seems to approximate human memory and learning mechanisms, as forgetting what has been learned before no longer occurs. 

Besides this playful virtualization, Wu Dao 2.0. is much more. This project is to be understood as the next milestone for the future of an overall transformed AI industry infrastructure, like an electricity grid. It will connect AI applications with each other and will manage capacities more intelligently. All of this will be reinforced by the fact that providers will use customer-provided data to expand the training set and contribute to the improvement of the overall system on an ongoing basis. 

Wu Dao demonstrates the Status Quo of China’s AI strategy 

The fact that the Chinese government has been using the potential of AI as a strategic advantage in international competition for several years is certainly not a new insight. The Wu Doa project is the first low-hanging fruit of the Chinese AI and Innovation Plan, which envisaged the establishment of 50 new AI institutes by 2020. Already in 2018 and 2019, the government in Beijing invested over 50 million dollars in the Beijing Academy of Artificial Intelligence. Until 2025, China’s strategic goal was to achieve a ” big breakthrough”. From the European point of view, it might be naive to hope Wu Doa 2.0 will be this case. 

From the perspective of research, China can now see itself as the leading nation in the world in terms of AI publications and patents. The global share has shifted in recent years from 4% in 1997 to 28% in 2017, and the trend is rising. It also indicates the power China can unleash in the field of AI-enabled businesses, such as voice and image recognition applications. 

Challenges for Europe 

By now Chinese market players are already following the AI transformation and take things to the next technology level. The consequence of this prevailing development will be huge market pressure on European companies and states. One prominent example that has recently sparked geopolitical dynamics is the Chinese social media platform TikTok. 

We also should not underestimate the fact that AI models always express the data and biases of their programmers. Specifically, if developments toward English and Chinese language models manifest themselves, other cultures will have to fight to have their languages and values considered. 

All this makes it more important to emphasize that AI models are an informal indicator of continental or national progress and a key dimension of technological competition between China, the U.S. and Europe. 

According to a study by the European Investment Bank, the U.S. and China are responsible for approximately 80 percent of investments in AI and blockchain technologies, while Europe claims only 7 percent of the investment amount, about 1.75 billion euros. 

With Wu Dao 2.0, Europe gets one step closer to losing its digital sovereignty in AI. Do not let us make this fear come true. 

Europe’s AI position needs to be strengthened 

In April 2021, seven European AI industry associations, including Germany, Austria, Sweden, Croatia, Slovenia, the Netherlands, France, and Bulgaria, approached the EU Commission to draw attention to the situation and propose measures to develop large-scale AI models in Europe. 

After all, if Europe does not react quickly, there is a risk that China and the U.S. will form oligopolies or monopoly markets. All European forces and resources must be bundled to invest more strongly in moonshot projects like LEAM or Open GPT-X – it is the only way to avoid staying behind. 

<a href="https://www.alexanderthamm.com/en/blog/author/alexander/" target="_self">ALEXANDER THAMM</a>


Alexander Thamm ist Founder, CEO und Pionier auf dem Gebiet der Daten & KI. Seine Mission ist es, einen echten Mehrwert aus Daten zu generieren und die internationale Wettbewerbsfähigkeit Deutschlands und Europas wiederherzustellen. Er ist Gründungsmitglied und Regionalmanager des KI-Bundesverbandes e.V., ein gefragter Speaker, Autor zahlreicher Publikationen und Mitbegründer des DATA Festivals, auf dem KI-Experten und Visionäre die datengetriebene Welt von morgen gestalten. Im Jahr 2012 gründete er die Alexander Thamm GmbH [at], welche zu den führenden Anbietern von Data Science & Künstlicher Intelligenz im deutschsprachigen Raum gehört.



Data Navigator Newsletter