Home Manifesto Blog Sample Report Get Early Access

How AlphaFold Changed Drug Discovery — and What Comes Next

How AlphaFold Changed Drug Discovery — and What Comes Next

TL;DR: AlphaFold 2 solved the protein folding problem in 2020 by predicting 3D protein structures with near-experimental accuracy. DeepMind then released predicted structures for over 200 million proteins — virtually every known protein in nature. AlphaFold 3, released in 2024, extends this to protein-drug, protein-DNA, and protein-RNA interactions. The pharmaceutical industry has already integrated AlphaFold into early-stage drug discovery pipelines, but it has not replaced experimental validation. Here is what changed, what did not, and what comes next.

The 50-Year Problem AlphaFold Solved

Proteins are the molecular machines that execute nearly everything in biology. They catalyze reactions, transmit signals, provide structural support, and — critically for medicine — serve as the targets for most drugs. A protein's function is determined by its three-dimensional structure: the precise way its amino acid chain folds into a specific shape.

For over 50 years, predicting how a protein folds from its amino acid sequence alone was considered one of the grand challenges of biology. Christian Anfinsen's Nobel Prize-winning work in 1972 established that a protein's sequence contains all the information needed to determine its structure. But actually computing that structure from first principles remained intractable. The number of possible configurations for even a small protein is astronomically large — what Cyrus Levinthal described as Levinthal's paradox.

Experimental methods existed. X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and cryo-electron microscopy (cryo-EM) could determine protein structures with atomic resolution. But these methods are slow — often taking months to years per protein — expensive, and not always feasible. Some proteins resist crystallization. Membrane proteins, which are critical drug targets, are notoriously difficult to study experimentally.

By 2020, experimental methods had resolved approximately 170,000 protein structures, catalogued in the Protein Data Bank (PDB). Given that nature contains hundreds of millions of distinct proteins, the gap between known sequences and known structures was enormous.

AlphaFold 2: The CASP14 Breakthrough

The Critical Assessment of protein Structure Prediction (CASP) is a biennial competition where research groups attempt to predict the structures of proteins whose structures have been experimentally determined but not yet published. It serves as the field's objective benchmark.

In December 2020, DeepMind's AlphaFold 2 system achieved a median GDT score of 92.4 at CASP14, where a score of 90 or above is generally considered competitive with experimental methods. The system predicted structures with atomic-level accuracy for a majority of targets — a result that CASP organizers described as a solution to the protein folding problem for single-domain proteins.

How AlphaFold 2 Works

AlphaFold 2's architecture combines several innovations:

Multiple Sequence Alignments (MSAs). The system begins by searching protein sequence databases to find evolutionary relatives of the target protein. Patterns of co-evolution — positions in the sequence that change together across species — encode information about which parts of the protein are physically close in 3D space.

The Evoformer. A novel transformer-based neural network module processes both the MSA information and pairwise residue interactions simultaneously. This dual-track architecture allows the model to reason about evolutionary relationships and spatial proximity in parallel, iteratively refining its representation of the protein.

Structure Module. The model outputs 3D atomic coordinates directly, along with a per-residue confidence score called the predicted Local Distance Difference Test (pLDDT). This confidence metric is critical for practical use: it tells researchers which parts of the predicted structure are reliable and which are uncertain.

Recycling. The architecture passes its predictions through the network multiple times, allowing the model to iteratively refine the structure — similar to how a sculptor progressively adds detail.

The model was trained on the approximately 170,000 experimentally determined structures in the PDB, learning the fundamental physics and chemistry of protein folding from examples rather than explicit physical simulations.

The AlphaFold Database: 200 Million Structures

In July 2022, DeepMind and the European Molecular Biology Laboratory (EMBL) released the AlphaFold Protein Structure Database containing predicted structures for over 200 million proteins — representing nearly every protein in UniProt, the comprehensive protein sequence database. This single release expanded the universe of available protein structures by approximately 1,000-fold.

The database is freely accessible. Researchers can look up any protein and instantly access its predicted structure, complete with per-residue confidence scores. For the drug discovery industry, this eliminated one of the most significant bottlenecks in early-stage research: not having a structural starting point for a target protein.

Before AlphaFold, a pharmaceutical company might spend 6 to 18 months and several hundred thousand dollars to obtain an experimental structure of a new drug target. The AlphaFold database provides a predicted structure — often at sufficient accuracy for initial drug design — in seconds.

How Pharma Is Using AlphaFold Today

AlphaFold has been integrated into drug discovery pipelines across the pharmaceutical industry, but its role is specific and its limitations are understood.

Target Identification and Validation

When researchers identify a protein implicated in a disease, understanding its structure is essential for determining whether it is "druggable" — whether its surface contains pockets or binding sites where a small molecule could bind and modulate its function. AlphaFold predictions provide an immediate structural hypothesis for newly identified targets, accelerating the earliest stage of the drug discovery pipeline.

Virtual Screening and Molecular Docking

With a protein's structure in hand, computational chemists can perform virtual screening: computationally testing millions of potential drug molecules for their ability to bind to the target. This is dramatically faster and cheaper than experimental high-throughput screening. AlphaFold structures serve as starting models for these docking simulations, particularly for proteins where experimental structures are unavailable.

However, the accuracy requirements for molecular docking are demanding. Small errors in the position of side chains at a binding site — on the order of 1 to 2 angstroms — can significantly affect docking predictions. For targets where AlphaFold's confidence (pLDDT) at the binding site is below 70, experimental structures or further refinement are typically needed.

Structure-Based Drug Design

In later stages, medicinal chemists use structural information to rationally modify drug candidates — adding functional groups to improve binding, reducing off-target interactions, and optimizing the compound's drug-like properties. AlphaFold structures provide a starting framework for this iterative process, though high-resolution experimental structures remain the gold standard for final-stage optimization.

Real-World Examples

Multiple pharmaceutical companies have reported using AlphaFold to accelerate their pipelines:

AlphaFold 3: Predicting Molecular Interactions

In May 2024, DeepMind released AlphaFold 3, which extends beyond single-protein structure prediction to model how proteins interact with other molecules — including small-molecule drugs, DNA, RNA, ions, and other proteins.

Why Interactions Matter

A protein does not function in isolation. It binds to other molecules: substrates it catalyzes, signaling partners it communicates with, and — critically for medicine — drugs that modulate its activity. Predicting the structure of these molecular complexes is essential for understanding biology at a systems level and for designing effective therapeutics.

AlphaFold 3's Architecture

AlphaFold 3 uses a diffusion-based approach (similar to the technology underlying image generation models like DALL-E and Stable Diffusion) to predict the 3D coordinates of entire molecular complexes. Given the sequences of multiple proteins and the chemical structures of their binding partners, the model predicts how all components arrange in 3D space.

For protein-ligand (drug) interactions specifically, AlphaFold 3 significantly outperformed previous computational docking methods on standard benchmarks, though it does not yet match the accuracy of experimental co-crystal structures for all targets.

Implications for Drug Discovery

AlphaFold 3 addresses a critical gap. In drug discovery, knowing the structure of an isolated protein is necessary but not sufficient. What matters is how the drug candidate sits within the protein's binding site — the orientation, the specific contacts, the water molecules displaced. AlphaFold 3 provides computational predictions of these interactions at a level of accuracy that was previously only achievable through experimental methods.

This capability is particularly valuable for:

What AlphaFold Has Not Solved

Despite its transformative impact, AlphaFold has clear limitations that the drug discovery community understands well.

Conformational Dynamics

Proteins are not static objects. They move, flex, and adopt multiple conformations that are essential to their function. AlphaFold predicts a single static structure — typically the most energetically favorable conformation. For many drug targets, the therapeutically relevant conformation is not the ground state but an alternative configuration that the protein adopts during its functional cycle.

Molecular dynamics simulations and experimental methods like hydrogen-deuterium exchange mass spectrometry capture these dynamics, and they remain essential complements to AlphaFold predictions.

Disordered Regions

Approximately 30% of the human proteome consists of intrinsically disordered regions (IDRs) — segments that do not adopt a fixed 3D structure but remain flexible. AlphaFold correctly identifies these regions (with low pLDDT scores) but cannot predict their conformational ensembles. Since many IDRs are involved in signaling, transcriptional regulation, and disease, this represents a significant gap.

Binding Site Accuracy

While AlphaFold's backbone predictions are often excellent, the positions of amino acid side chains — which are critical for drug binding — can be less accurate, particularly at binding sites. For molecular docking and structure-based drug design, errors of 1 to 2 angstroms in side-chain positions can lead to incorrect binding predictions. This is why experimental validation remains essential for drug candidates approaching clinical trials.

Post-Translational Modifications

Proteins are frequently modified after they are synthesized: phosphorylation, glycosylation, ubiquitination, and dozens of other modifications alter their structure and function. AlphaFold predicts the structure of the unmodified protein, and these modifications can significantly change the protein's shape and behavior.

Speed and Accessibility

AlphaFold 2 requires substantial computational resources and time (minutes to hours per prediction). While this is dramatically faster than experimental methods, it is slow compared to simpler prediction tools. ESMFold, Meta AI's alternative, trades some accuracy for roughly 60x faster predictions — a tradeoff that matters when screening millions of proteins.

The Connection to Your DNA

Every genetic variant that changes an amino acid in one of your proteins — a missense variant — potentially alters that protein's 3D structure and function. When a clinical geneticist evaluates whether a variant in your DNA is likely to cause disease, they need to understand how the amino acid change affects the protein.

AlphaFold predictions provide the structural context for this assessment. Combined with tools like AI-powered variant effect predictors, they help determine whether a variant:

This is directly relevant to personalized medicine. When DeepDNA analyzes your genetic data, variant interpretation incorporates structural predictions to assess the functional impact of your unique combination of protein-coding variants.

What Comes Next

The AlphaFold trajectory points toward several developments that will further reshape drug discovery and genomics:

AlphaFold-based virtual screening at scale. Combining AlphaFold 3's interaction predictions with ultra-large virtual chemical libraries (billions of compounds) will enable computational screening at a scale that dwarfs current experimental capacity.

Integration with generative chemistry. AI models that design novel drug molecules (like Recursion's LOWE and Insilico Medicine's Chemistry42) are being coupled with AlphaFold to iteratively design and evaluate drug candidates computationally.

Antibody design. AlphaFold and related models are being extended to predict antibody-antigen interactions, which is critical for designing therapeutic antibodies — one of the fastest-growing drug classes.

Personalized structural pharmacogenomics. As AlphaFold predictions become faster and more accurate, it becomes feasible to predict how an individual's specific protein variants affect drug binding — moving beyond population-level pharmacogenomics toward truly personalized drug selection.

Open science acceleration. The free availability of the AlphaFold database has democratized structural biology. Researchers studying neglected diseases, rare conditions, and basic biology now have access to structural data that was previously available only to well-funded laboratories.

The Practical Takeaway

AlphaFold solved a 50-year-old problem and fundamentally changed the starting conditions for drug discovery. It did not replace experimental biology — crystallography, cryo-EM, and biochemical assays remain essential. But it eliminated one of the most significant bottlenecks in early-stage research: the lack of structural information for most proteins.

For anyone interested in how AI is transforming genomics, AlphaFold is the clearest example of a machine learning system that delivered on its promise at scale. The 200 million predicted structures are not theoretical. They are being used, right now, in laboratories and pharmaceutical companies around the world.

The next chapter — predicting how your specific genetic variants affect protein structure and drug response — is where genomics and drug discovery converge. And it is already underway.


Want to see how AI analyzes your protein-coding variants? DeepDNA uses the latest computational approaches to interpret your existing DNA data — from providers like 23andMe, AncestryDNA, and others — with full pharmacogenomic analysis. Explore your genome.

See how AI interprets your protein variants

AlphaFold predicts protein structures. DeepDNA uses these advances to analyze how your genetic variants affect protein function — from drug metabolism to disease risk.

See a Sample Report
or join the beta for early access
Welcome! Explore a sample report while you wait.

See what DeepDNA reveals
about your DNA

You just went deeper than most ever will. See a real AI-powered genomic report — or join the beta to get yours when we launch.

Preview a Sample Report
or get early access
Welcome! Explore a sample report while you wait.
How do your variants affect proteins? Preview your AI-powered report. View Sample Report