In the realm of biology, our DNA serves as a comprehensive codebook containing essential instructions for the functioning and development of our bodies.
We are going to explore how one artificial intelligence (AI) offshoot of Google’s DeepMind has classified 89% of 71 million potential ‘errors’ in DNA — a stark contrast to the mere 0.1% identified by human experts.
It is now employed by hundreds of thousands of researchers across 190 countries.
But before we get to that, we need to set the scene.
How Protein is Coded and Created
One significant chapter within the DNA codebook pertains to the creation of proteins, which play vital roles, such as transporting life-sustaining oxygen, facilitating digestion, and upholding the structural integrity of the body. This chapter is composed of approximately 20,000 to 25,000 code words.
Over time, some of these code words can become corrupted, leading to severe abnormalities like type 2 diabetes, cancer, cystic fibrosis, and more.
These flawed codes, referred to as missense mutations, represent genetic mutations – tiny glitches that, when present in our genes, can alter the function of the associated proteins. It’s akin to a single letter in a word being flipped, potentially changing the entire meaning of a sentence.
However, while each of us carries around 9,000 of these corrupted codes within our DNA, not all of them are harmful. Just as not every typo in an article fundamentally changes its core message, many genetic mutations are benign, exerting little influence on our health. While our body’s built-in proofreading mechanisms work diligently to correct these mutations, like an editor’s thorough spell-check, not all genetic errors can be repaired by the cellular machinery of our body.
Deciphering Genetic Language
Understanding this genetic language and comprehending the impact of the “corrupted codes” or missenses are pivotal quests in the field of biological research. This endeavor is the key to unlocking advanced diagnostic methods and paving the way for the development of treatments for genetic diseases. The detection of harmful or pathogenic missenses provides us with the opportunity to shape the narrative of our health and well-being, ensuring that our genetic codebook remains as error-free and vibrant as possible.
While biologists have long strived to unleash the impact of missense mutations, it remains largely mysterious, which of the million possible genetic errors could give rise to disease due to the limited experimental data available. For instance, out of the more than 4 million observed missense variants in humans, only 2% have been identified as pathogenic or benign by experts, accounting for roughly 0.1% of all 71 million potential missense variants.
AlphaFold: Transforming the Landscape of Biological Research
In 2018, DeepMind, a subsidiary of Google, achieved a groundbreaking feat in biological research by introducing AlphaFold, an artificial intelligence (AI) system capable of accurately predicting a protein’s 3D structure solely from its amino acid sequence.
This addresses the long-standing “protein folding problem“, a challenge that has perplexed biologists for over 50 years. Proteins are the foundational components of life, governing every biological process. Understanding a protein’s structure offers profound insights into genetic language and evolution.
AlphaFold’s training on around 170,000 proteins from the protein data bank empowered it to decipher protein structures. By comparing sequences in the data bank, it deduced the proximity of amino acids in folded structures, making educated guesses about unknown structures.
This groundbreaking AI, achieved after a few weeks of machine learning using computational power equivalent to 100 to 200 GPUs, transformed the field of biological research. A database of protein structures created by AlphaFold is being used by millions of researchers around 190 countries for varied tasks, such as comprehending diseases to accelerating drug discovery.
AlphaMissense: Transforming Genetic Error Detection
Extending on the AlphaFold model, DeepMind researchers have recently equipped this model with the ability to predict the pathogenicity of missense variants. To achieve this goal, AlphaFold is fine-tuned on labels distinguishing variants seen in human and closely related primate populations. Commonly observed variants are categorized as benign, while unseen variants are classified as pathogenic.
This newly developed model, referred to as AlphaMissense, offers a continuous score, enabling users to choose a threshold for variant classification according to their specific accuracy requirements. In their published paper in Science, researchers demonstrated that AlphaMissense classified 89% of all 71 million potential missense variants as likely pathogenic or benign, compared to the 0.1% that had been identified by human experts.
Fine-tuning the pre-trained AlphaFold also reduces the need for extensive training data, computational time, and energy consumption.
AlphaMissense Catalog: Empowering Genetic Research
DeepMind has created a catalog based on the predictions generated by the AlphaMissense model and has made it accessible to the public. While this catalog cannot directly diagnose conditions, it serves as a valuable resource for researchers and medical professionals.
Researchers seeking to determine whether a specific missense variant may be responsible for a disease can now consult the table and find its predicted pathogenicity score. AlphaMissense could help researchers streamline the process of matching genetic mutations to diseases by swiftly eliminating unlikely options. It could also enhance our understanding of overlooked aspects of our genetic code.
This open catalog can be instrumental in shedding light on previously mysterious conditions and guiding the path toward more accurate diagnoses and treatments. DeepMind researchers emphasize, however, that the predictions should not be used in isolation but rather as a guide for real-world research.
DeepMind has taken a cautious approach regarding the release of their trained model to prevent potential misuse of this technology. Instead of making the model available for immediate download and use, they have made all the essential information, including the computer code, accessible, allowing others to replicate their work.
In the pursuit of understanding our genetic code and unraveling the mysteries of genetic diseases, AlphaMissense AI by DeepMind offers a groundbreaking tool.
Classifying 89% of all 71 million potential missense variants, compared to the 0.1% we have got to so far, is incredible – a testament to how much valuable work artificial intelligence can bring to the table.
By predicting the pathogenicity of missense variants, these tools empower researchers and medical professionals to identify the genetic origins of complex syndromes.
While not a diagnostic tool on its own, the publicly available AlphaMissense catalog is a valuable resource for guiding genetic research and has the potential to speed up linking genetic mutations to diseases. DeepMind’s cautious approach ensures that this transformative technology is used responsibly, allowing open access to essential information while guarding against potential misuse.