What is AlphaFold?

AlphaFold is a Artificial intelligencewhich is able to predict the three-dimensional protein structure using only the amino acid sequence of the protein. In 2020, AlphaFold2 has achieved the accuracy of experimental methods, solving a 50-year-old problem in biology: the problem of protein folding.

Since 2021, AlphaFold DB has been available as Database of protein structures is freely available to scientists around the world. In addition, a free version of AlphaFold2 as open source software at Github be used.

The solution to the protein folding problem

AlphaFold is another success story from Google DeepMind (AlphaGo, AlphaZero and Ithaca). Shortly after AlphaGo clearly defeated the famous professional player Lee Sedol in GoIn 2016, a new DeepMind team was put together. Its goal was to solve the problem of protein structure prediction.

2018 marks the team's first success: AlphaFold takes first place in the 13th CASP competition.

The CASP; Critical Assessment of Techniques for Protein Structure Prediction (in German: kritische Bewertung von Techniken zur Vorhersage von Proteinstrukturen) is an association of scientists who have been researching the problem of protein folding since 1994. Every two years, a competition takes place in which research teams are given a selection of amino acid sequences for proteins. Their exact three-dimensional shape is already known, but not publicly available. The teams give their best predictions to see how close they are to the actual structures.

The AlphaFold's outstanding results in this competition are published in the journal Nature and the DeepMind team continues to expand. In 2020, the breakthrough is made: AlphaFold2 wins the 14th CASP competition by a large margin and is recognised by the CASP organisers as the solution to the 50-year-old "protein folding problem". AlphaFold2 can predict protein structures to atomic accuracy with a mean error (RMSD_95) of less than 1 angstrom, making it three times more accurate than the next best system and comparable to experimental methods. Among experts, the solution of the protein folding problem is dubbed one of the most important achievements since the mapping of the human genome.

How does AlphaFold work?

In principle, AlphaFold uses Neural networks which through Deep Learning be trained. After the very good results with CASP13, the programme is being further developed. Its methods and code are published in the scientific journal Nature. This results in Open source implementations from the community. DeepMind itself has added new deep learning architectures and further developed the methods.

Crucial to these methods are the fields of biology, especially in the area of protein folding, as well as physics and machine learning.

To understand the physical interactions within proteins, it is important to understand how a folded protein is constructed. It can be seen as a "spatial graph" in which residues are the nodes and edges connect the residues in close proximity. For AlphaFold2, which wins CASP14, an attention-based neural network system is used. With this, it continuously tries to interpret the structure of the protein while thinking about the diagram it is assembling. Multiple sequence alignment (MSA) and a representation of amino acid residue pairs are used to improve this diagram.

By constantly repeating this process, AlphaFold2 achieves strong predictions of protein structure. By using its own confidence measure, AlphaFold2 is also able to determine which parts of its predicted protein structure can be classified as reliable.

AlphaFold Protein Structure Database

In close cooperation with the European Bioinformatics Institute at the European Molecular Biology Laboratory (EMBL-EBI) DeepMind 2021 launches the AlphaFold Protein Structure Database. Thus the Scientific community free and open access to the human proteome (the totality of all proteins in the human body) together with 20 other model organisms, including mice. The database thus comprises a total of over 350,000 structures. At the beginning of 2022, DeepMind will add another 27 proteomes (corresponding to over 190,000 proteins) to the database.

To date, over 300,000 researchers worldwide have made use of the database. This represents AlphaFold one of AI's most significant contributions to advancing scientific knowledge dar.