Google’s DeepMind AI Deciphers 3D Structures of the “Entire Protein Universe.”

It was only in 1957 that scientists had special access to the third molecular dimension.

John Kendrew of the University of Cambridge has discovered the 3D structure of a protein after 22 years of painstaking experiments. It’s the twisted blueprint of myoglobin, a 154 amino acid chain that helps fill our muscles with oxygen. As revolutionary as this discovery was, Kendrew was unable to uncover the protein architecture. Less than ten more would be identified over the next ten years.

Today, it’s been 65 years since that Nobel Prize-winning achievement.

On Thursday, Google’s sister company, DeepMind, announced that it has successfully used artificial intelligence to predict the 3D structures of proteins in nearly every catalog known to science. These are more than 200 million proteins found in plants, bacteria, animals, humans – almost anything you can imagine.

“Basically, you can think of it as covering the entire protein universe,” DeepMind founder and CEO Demis Hassabis told reporters this week.

That’s thanks to AlphaFold, DeepMind’s renewable AI system, which has an open-source database so scientists around the world can draw on it freely and freely for their research. Since AlphaFold’s official launch last July, when it identified only about 350,000 3D proteins, the program has made a noticeable dent in the research landscape.

“More than 500,000 researchers and biologists have used the database to view more than 2 million structures,” Hassabis said. “And these predictive structures have helped scientists make new discoveries.”

For example, in April, scientists at Yale University called on the AlphaFold database to help develop a new, highly effective malaria vaccine. And last July, scientists at the University of Portsmouth used this system to develop enzymes that could combat single-use plastic pollution.

John McGeehan, director of Portsmouth’s Center for Enzyme Innovation and the researcher behind the latest study, told the New York Times: “It puts us a year ahead of where we were, if not two years.”

Ribbon diagram of vitellogenin protein, showing blue, yellow, and orange ribbons.

3D structure of egg yolk vitellogenin.


These efforts are just a small sample of AlphaFold’s amazing capabilities.

“Last year alone, there were over a thousand scientific papers on a wide range of research topics using AlphaFold structures; I’ve never seen anything like it,” says Samir Velankar, DeepMind Collaboration and Group Leader at the European Molecular Biology Laboratory. Protein Data Bank, according to a press release.

Others using the database for Hassabis include those trying to improve our understanding of Parkinson’s disease, those seeking to protect the health of honey bees, and even some seeking valuable insights into human evolution.

“AlphaFold is already changing the way we think about the preservation of molecules in fossils, and I see it soon becoming a fundamental tool for researchers working not only in evolutionary biology, but also in archeology and other paleosciences,” said Beatrice. Demarchi, associate professor at the University of Turin, who recently used the system in research on the ancient egg controversy, said in a press release.

In the coming years, DeepMind also intends to partner with the Medicines for Concerned Diseases initiative and groups from the World Health Organization, with the goal of curing understudied but widespread tropical diseases such as Chagas disease and Leishmaniasis.

“This will force many researchers around the world to think about what experiments they can do,” Evan Birney, a DeepMind collaborator and deputy director of EMBL, told reporters. “And think about what happens in organisms and the systems they study.”

Locks and buttons

So why do so many scientific advances depend on this treasure trove of 3D protein modeling? Let us explain.

Let’s say you’re trying to make a key that fits a lock perfectly. But you can’t see the structure of that lock. You know, there’s this lock, some information about its materials, maybe some numerical information about how big each ridge is and where those ridges should be.

It is possible that this key could not be developed, but it would be very difficult. The keys must be exact or they will not work. Therefore, before you begin, you will do your best to model several different dummy locks with whatever information you have on hand to make your key.

In this analogy, the lock is a protein, and the key is a small molecule that binds to the protein.

For scientists, whether they are doctors trying to develop new drugs or botanists studying the anatomy of plants to make fertilizers, the interactions between certain molecules and proteins are crucial.

With drugs, for example, the way a drug molecule binds to a protein can be the breaking point for it to work. This interaction is complicated because, although proteins are simply strings of amino acids, they are not straight or flat. They inevitably fold, bend, and sometimes get tangled around like the headphone wires in your pocket.

In fact, the unique folds of a protein determine how it functions—even the smallest errors in the human body can lead to disease.

But when it comes to small molecule drugs, sometimes parts of the folded protein don’t bind to the drug. For example, they can be folded in a strange way, which makes them inaccessible. Things like this are crucial information for scientists trying to clone a drug molecule. “I think almost every drug that has come on the market in the last few years has been developed in part through knowledge of protein structures,” EMBL research scientist Janet Thornton said at the conference.

That’s why researchers usually spend an incredible amount of time and effort deciphering the folded, 3D structure of the protein they’re working on, and you can start your key-making journey by piecing together a lock pattern. If you know the exact structure, it’s much easier to tell where and how a molecule attaches to a given protein, and how that attachment affects the protein’s folds in response.

But this effort is not simple. Or cheap.

“The cost of solving a new, unique structure is $100,000,” said Steve Darnell, a structural and computational biologist at the University of Wisconsin and a researcher at the bioinformatics company DNAStar.

Because the solution usually comes super complex laboratory experiments.

Kendrew, for example, used a technique called X-ray crystallography at the time. Basically, this technique requires you to take solid crystals of the protein of interest, place them in an X-ray beam, and see what pattern the beam creates. This pattern is almost a position thousands of atoms inside the crystal. Only then can you use the pattern to unravel the structure of the protein.

There is also a more recent technique called cryo-electron microscopy. This is similar to X-ray crystallography, but the protein sample is directly bombarded with electrons instead of X-rays. Although it is considered to be much higher in resolution than other techniques, it cannot accurately penetrate everything. Furthermore, in the field of technology, some attempts have been made to digitally create protein folding structures. However, like several attempts in the 80s and 90s, the first attempts did not go well. As you can imagine, laboratory methods are also tedious and difficult.

Over the years, such obstacles have given rise to the so-called “protein folding problem.” Simply put, scientists don’t know how proteins fold, and they’ve faced significant hurdles to overcome.

AlphaFold’s AI could be a game changer.

A graph of the number of species represented in the AlphaFold database showing 5 major circles.  Each circle has a small dot indicating the previous number of proteins in the database.  Large circles are about 5 points larger.

A chart of the explosive growth by species of the AlphaFold database provided by DeepMind.


Solving the “Folding Problem”.

In short, AlphaFold was trained by DeepMind engineers to predict protein structures without the need for laboratory intervention. No crystals, no electron firing, no $100,000 worth of experiments.

To get AlphaFold to its current state, the system first exposed 100,000 known protein folding structures, according to the company’s website. Then, over time, he learned to decipher the rest.

It’s really that straightforward. (Well, except for the AI ​​coding talent.)

“I don’t know, it takes at least $20,000 and a lot of time to crystallize one protein,” Birney said. “This means experimentalists have to choose what to do – AlphaFold doesn’t have to make a choice yet.” This feature of AlphaFold’s subtlety is very interesting. What this means is that scientists have more freedom to guess and test, to follow an inkling or a gut instinct and cast a wider net in their research when it comes to protein structures. They don’t have to worry about costs or deadlines.

Jan Kosinski, DeepMind Collaboration and Structural Models at EMBL in Hamburg, Germany said: “Models also come with prediction error.” “And often — in fact, in many cases — the error is really small. That’s why we call it near-atomic precision.”

Furthermore, the DeepMind team also says that it has conducted various risk assessments to make sure that AlphaFold is safe and ethical to use. Members of the DeepMind team also said that AI in general could pose biosecurity risks that we haven’t thought to assess before, especially as such technology continues to enter the medical space.

But as the future unfolds, according to the DeepMind crew, AlphaFold will adapt fluidly and address such concerns on a case-by-case basis. So far, it seems to be working – with a universe of protein models that boil down to a simple portrait of myoglobin.

“Only two years ago,” said Birney, “we just didn’t realize it was possible.”

Fix 6:45 a.m. PT: Janet Thornton’s last name and title have been corrected.

Leave a Comment

Your email address will not be published.