Scientists have developed artificial intelligence software that can create proteins that could be useful for vaccines, cancer treatments or even for removing carbon pollution from the air.
The research, reported today in the journal Science, was led by the University of Washington School of Medicine and Harvard University. The article is titled “Scaffolding Protein Functional Sites Using Deep Learning.”
The proteins we find in nature are wonderful molecules, but designed proteins can do much more. In this work, we show that machine learning can be used to design proteins with a wide variety of functions.”
David Baker, senior author, HHMI investigator and professor of biochemistry at UW Medicine
For decades, scientists have used computers to engineer proteins. Some proteins, such as antibodies and synthetic binding proteins, have been adapted into drugs to combat COVID-19. Others, such as enzymes, aid in industrial manufacturing. But a protein molecule often contains thousands of bonded atoms; Even with specialized scientific software, they are difficult to study and engineer.
Inspired by how machine learning algorithms can generate images from stories or even signals, the team built similar software to design new proteins. “The idea is the same: Neural networks can be trained to look for patterns in data. Once trained, you can give it a signal and see if it can produce an elegant solution. Often The results are compelling — or even beautiful,” said lead author Joseph Watson, a postdoctoral scholar at UW Medicine.
The team trained several neural networks using information from the Protein Data Bank, a public repository of hundreds of thousands of protein structures from all kingdoms of life. The resulting neural networks have puzzled even the scientists who created them.
The team developed two approaches to design proteins with new functions. The first, called “hallucinations”, is similar to DALL-E or other generative AI tools that produce new outputs based on simple signals. The second, called “inpainting”, is similar to the autocomplete feature found in modern search bars and email clients.
Lead author Xu Wang, a postdoctoral scholar at UW Medicine, said, “Most people can come up with new images of cats or write a paragraph when asked, but with the protein design, the human brain can’t do what it does now.” Computers can.” , “Humans can’t imagine what a solution might look like, but we’ve set up machines that do.”
To explain how neural networks ‘hallucinize’ a new protein, the team compares it to how a book might write: “You start with a random assortment of words – total ambiguity. Then you Let’s impose a requirement such as in the opening paragraph, it must be a dark and stormy night. Then the computer will replace the words one at a time and ask itself ‘Does this make more sense to my story?’ If it does, it keeps changes until the whole story is written,” explains Wang.
Both books and proteins can be understood as long sequences of letters. In the case of proteins, each letter corresponds to a chemical building block called an amino acid. Starting with a random chain of amino acids, the software repeatedly changes the sequence until a final sequence that encodes the desired function. These final amino acid sequences encode proteins that can then be manufactured and studied in the laboratory.
The team also showed that neural networks can fill in missing pieces of protein structure in a matter of seconds. Such software can help in the development of new drugs.
With “autocomplete, or “protein inpainting,” we start with the key features we want to see in a new protein, then let the software come up with the rest. Those features are known as binding motifs or even binding motifs. that enzymes may be known as active sites,” explains. Watson.
Laboratory testing showed that many of the proteins produced through hallucinations and inpainting worked as intended. This includes new proteins that can bind to metals as well as the anti-cancer receptor PD-1.
New neural networks can generate many different types of proteins in as little as a second. Some include potential vaccines for the deadly respiratory syncytial virus, or RSV.
All vaccines work by introducing a piece of a pathogen to the immune system. Scientists often know which fragment will work best, but creating a vaccine that achieves a desired molecular size can be challenging. Using the new neural network, the team induced a computer to create new proteins that included the required pathogen fragment as part of their final structure. The software was free to build any supporting structure around the key fragment, generating many potential vaccines with diverse molecular shapes.
When tested in the lab, the team found that known antibodies against RSV stuck to three of their hallucinogenic proteins. This confirms that the new proteins adopted their intended shape and suggests that they may be viable vaccine candidates that can prompt the body to generate its own highly specific antibodies. Additional testing, including in animals, is still needed.
“I started working on the vaccine stuff as a way to test my new methods, but in the middle of working on the project, my two-year-old son got infected with RSV and spent an evening in the ER to help him. Have the lungs. It’s okay. It made me realize that the ‘testing’ problems we were working on were actually quite worthwhile,” Wang said.
“These are very powerful new approaches, but there is still a lot of room for improvement,” said Baker, who was the recipient of the 2021 Breakthrough Prize in Life Sciences. “For example, designing high activity enzymes is still very challenging. But every month our methods are getting better! Deep learning has changed protein structure prediction over the past two years, now we are similar to protein design.” are in the midst of change.”
The project was led by Xu Wang, Doug Tischer, and Joseph L. Watson, who is a postdoctoral scholar at UW Medicine, as well as Sidney Lisanza and David Jurgens, who are graduate students at UW Medicine. Senior authors include Sergei Ovchinnikov, John Harvard Distinguished Science Fellow at Harvard University and David Baker, professor of biochemistry at UW Medicine.
Wang, J., and others. (2022) Scaffolding of protein functional sites using deep learning. science. doi.org/10.1126/science.abn2100.