AlphaFold delivers breakthrough in protein structure prediction
Laura Reich Diez
Until now, scientists only knew the structures of a small fraction of the proteins in the human body, which are the basic building blocks of life. The artificial intelligence AlphaFold has recently changed that. Last year, researchers from the British company DeepMind, a subsidiary of the Google holding Alphabet, achieved something that was considered impossible: almost perfect predictions of protein structures.
AlphaFold uses Deep Learning to predict a protein structure based on the amino acid sequence of the protein. The result is a freely accessible database that contains many thousands of protein structures. The possible areas of application are broad: on the one hand, the data should lead to breakthroughs in medical research, but on the other hand, it can also be used to develop bacteria that can decompose plastic in the environment, as well as for plant breeding.
The database is maintained by the European Molecular Biology Laboratory (EMBL) and covers almost the complete set of all 20,000 proteins found in humans, the so-called proteome. However, the accuracy levels of the prediction differ. AlphaFold also opens up numerous other proteins that are important for research in other model organisms such as mice, fruit flies or coli bacteria.
In an interview with Peter Kuhn from 5-HT, Christoph Müller, Head of the Structural and Computational Unit at EMBL Heidelberg, talks about the breakthrough of the AlphaFold Database and the progress it has brought.
Surprising response to AlphaFold's outcome
"I was definitely surprised, as was the whole community," Müller says of his reaction to AlphaFold's result in the CASP competition. "In the past, AlphaFold already achieved good results, but still in the field with the other participants. AlphaFold's current result, on the other hand, is much better and an important contribution to solving the protein folding problem." Deepmind's AlphaFold 2.0 predicts protein folds so well that scientists at CASP are calling the AI a solution to the 50-year-old protein folding problem.
Initial concerns of AlphaFold's practical applications were quickly jettisoned as it quickly became clear that the method is robust and can be widely applied. "When a scientist is surprised, they wonder what's behind it, it's in our nature. We might have been a little skeptical at the beginning, but overall, we were very pleased. We think it's an immense step forward."
Back in 2018, DeepMind won the prestigious CASP competition with its AI application AlphaFold. Last year, when the scientists entered again, they achieved revolutionary results and thus a breakthrough in one of the most important problems in biology.
Critical Assessment of Protein Structure Prediction" - Background to the CASP Competition
The Critical Assessment of Protein Structure Prediction (CASP) is a competition to review current methods for protein folding prediction with the aim of finding and thus promoting the best methods for protein structure prediction.
In the competition, participants are asked to predict protein structures that have already been empirically solved by other scientists but have not yet been published. They are known only to the participating scientists and CASP. This allows CASP to check the predictions without the participants having had prior access to the empirical data.
Predominantly correct results with AlphaFold - a breakthrough
"It used to be that people said there was a predicted structure that was right in some areas and wrong in some areas, but still pretty wrong overall," Müller explains the state of knowledge before AlphaFold. "In contrast, the AlphaFold predictions are predominantly quite correct and that is a super result. The predictions are so accurate that they get into a range that you can reach with experimental structural biology methods."
The AlphaFold Protein Structure Database
The AlphaFold Database, in collaboration with EMBL´s European Bioinformatics Institute (EMBL-EBI), provides free access to protein structure predictions of the human proteome and many other organisms to accelerate scientific research. The data base is updated yesterday to include much more data on several organisms.
This update will more than double the size of the AlphaFold database, from 365,000 protein structure predictions to > 800,000 protein structure predictions.
It builds on many previous contributions from the international scientific community as well as AlphaFold's sophisticated algorithmic innovations and EMBL-EBI's decades of experience.
"The ease of use of the Google-style website and the data itself have immense implications for structural biology, but also overall for life sciences and medical research or its applications. To illustrate, you type in the name of a protein or gene and you get the result thrown out directly. The fact that anyone working on a problem can instantly retrieve a three-dimensional structure prediction is a significant advance. AlphaFold is already helping scientists to speed up discoveries," Christoph Müller explains the positive effects of AlphaFold. In addition, AlphaFold also saves a great deal of time in research. For German scientists, AlphaFold is of great importance, because in terms of the origin of users, Germany is among the TOP 5 countries.
"In our own research, we now use a lot of cryo-electron microscopy to solve structures of protein complexes. Complexes are something that AlphaFold has not yet been able to predict, but if you know the folding of the individual building blocks, then the interpretation of the overall structure of these complexes becomes much easier," says Müller. A complex is an accumulation of several proteins. Only in this complex can many proteins fulfill their cellular function.
Prediction of components arising from interactions between proteins
Calculating complexes that arise from the interactions between individual proteins could be possible with accuracy in the future. "That is certainly within the realm of possibility. In the meantime, there are already first steps and publications with very good results. At least in the prediction of two components, such as dimer structures or oligomers of two different proteins, there are promising results. That means it is to be expected that this will be one of the next steps."
Christoph Müller and his group at EMBL work in the field of regulation of transcription. Among other things, they are looking at the question: how is DNA transcribed into messenger RNA or other RNA molecules? In recent years, they have been able to elucidate many of these processes. "We are now specially interested in conformational changes of molecular machines. What are the reaction mechanisms? How does the structure of complexes change in interaction with other factors?"
Potential of breakthrough use cases in sight
"For many targets, there is structural information and many predictions where AlphaFold can be consulted. But then we get back into the interaction between a protein and a ligand and the predictions there. So far, AlphaFold cannot predict the complex between protein and ligand and the affinity of the ligand.
I think, for the moment AlphaFold is more applicable in this area for planning projects. There is a structural framework for many of the hypotheses. Within that, AlphaFold has contributed a lot. There are now about 350,000 structure predictions of proteins on the AlphaFold server.
With the AlphaFold algorithmen you can also calculate structures yourself. The goal is that, at EMBL-EBI in Cambridge, there will eventually be several million structure predictions deposited, so that for virtually every structure for which a gene is deposited on the database, there will also be a prediction for this three-dimensional structure.
No cell biologist will be able to claim: there is no structure. The argument no longer applies, because all the scientist has to do, so to speak, is 'go to AlphaFold, click on it' and take the results into account in his research."
Loose ties lead to valuable collaboration
"Over the years, loose contacts have been consolidated by a common concern that structures should be made available to the public," recapitulates Müller.
As a result, DeepMind and EMBL agreed to work together to make the most complete and accurate database to date of around 20,000 proteins and their predicted structures from the human genome, i.e. the complete human proteome, freely available to the scientific community.
"The wave has been triggered and there are other groups who also go in the same direction. For us, it's very important that we deposit this data and provide access to this data for the public."
Among diseases, antibiotic resistance and recycling of single-use plastics - a look into the future of AlphaFold
AlphaFold is already being used to research rare diseases, study antibiotic resistance and recycle single-use plastics. "We are currently working on complexes with single particle electron microscopy. That means very large complexes that are isolated in vitro and where we look at their conformation. In the meantime, however, we are also working a lot with cryo-electron tomography, where these complexes are not isolated from the cell, but analysed practically in the cell. That is experimentally very complex.
This idea that we can visualize many different proteins together in a cell, which are then all in a certain conformation with respect to each other and which can be analysed, is no longer far away," Müller explains.
"With the help of AlphaFold, this data can then be interpreted within an electron microscopy map. This is a step into the future. I assume that AI will succeed in interpreting the overall picture of the cell in the future. Ultimately, it should no longer just show the three-dimensional detailed structure of a single protein or a complex, but the picture of an entire cell with atomic resolution.
Imagine looking at a cancer cell and seeing how the signaling pathways look different compared to a healthy cell- that would be great, wouldn't it?"
On the way to understanding the components of life
It sounds like you're on your way to understanding the components of life and being able to modify them? "I think that's the case. There are insanely large amounts of data that need to be accessed. This is where AI is an elemental tool to simplify and do the sorting of data sets, because the amount of information is overwhelming."
A giant leap opens countless possibilities
"The applications, also with respect to the environment, are broad:
- Green Revolution
- New enzymes with new substrate effects, etc.
- Plastic-degrading bacteria
- Green Engineering
- White Chemistry
- More environmentally friendly production of certain products
Three-dimensional structures are essential for enzymatic reactions and interactions. If they are to be used, they need to be well understood and AlphaFold predictions can be an important building block in this respect. We are proud that EMBL makes the AlphaFold predictions available to the public."
About the European Molecular Biology Laboratory (EMBL)
EMBL is Europe’s flagship laboratory for the life sciences. We are an intergovernmental organisation established in 1974 and are supported by 27 member states, 2 prospective member states and 1 associate member state.
EMBL performs fundamental research in molecular biology, studying the story of life. We offer services to the scientific community; train the next generation of scientists and strive to integrate the life sciences across Europe.
We are international, innovative and interdisciplinary. We are more than 1800 people, from over 80 countries, operating across six sites in Barcelona (Spain), Grenoble (France), Hamburg (Germany), Heidelberg (Germany), Hinxton (UK) and Rome (Italy). Our scientists work in independent groups and conduct research and offer services in all areas of molecular biology.
Our research drives the development of new technology and methods in the life sciences. We work to transfer this knowledge for the benefit of society.
For more information, visit: https://www.embl.org/
About 5-HT Digital Hub for Chemistry & Health
The 5-HT Digital Hub Chemistry & Health is part of the Digital Hub Initiative (de:hub) initiated by the Federal Ministry for Economic Affairs and Energy to promote digital innovation in Germany.
The goal of the Digital Hub is to build an international ecosystem of startups, investors and companies to drive digital innovation in the chemistry and health sectors. As a central platform, the Digital Hub offers stakeholders the opportunity to network, cooperate and co-develop. In addition, the Digital Hub conducts university challenges in the form of the 5-HT Digital Qualifier. In addition, the Digital Hub offers the 5-HT X-linker, a one-week start-up boot camp to prepare national and international start-ups for their individual meetings with renowned chemical and pharmaceutical companies. In addition, the startups get in touch with potential financiers through an investor pitch.
As part of the "Insuring Digital Health" programme, the Digital Hub and EIT Health bring together startups and health insurance companies to integrate innovative patient-centred digital solutions from the startups into the service portfolio of the health insurance companies.
The companies BASF, SAP, Pepperl+Fuchs, Merck, Roche, Gelita, Daikin, MEDI-MARKT, GWQ, Endress+Hauser, Accenture, IniNovation and Schrödinger are participating as sponsors in the Digital Hub. The inclusion of further corporate partners is possible.
For more information visit: https://www.5-ht.com/
Follow the Digital Hub on LinkedIn (@Digital Hub Mannheim/Ludwigshafen - Chemistry&Health 5-HT) and Twitter: @deHubChemHealth
Become part of the 5-HT Digital Hub Chemistry & Health
Exchange ideas with innovative startups and future-oriented companies in our ecosystem. We look forward to meeting you!