Home Manifesto How It Works Blog Sample Report About Get Early Access
In short: You upload a DNA file. Our engine parses ~800,000 genetic variants, annotates each one against 7+ clinical databases in parallel, calculates polygenic risk scores, runs pharmacogenomic profiling, and generates a plain-language report using frontier AI models. The entire process runs on European servers. Your file is deleted after analysis. No data is retained.

1. The analysis pipeline

When you upload your genotype file, it passes through a six-stage pipeline. Each stage is independent and fault-tolerant — if one annotation source is temporarily unavailable, the rest continue. Nothing blocks.

01

Upload & format detection

We accept raw data files from 23andMe (v3–v5), AncestryDNA (v1/v2), MyHeritage, FamilyTreeDNA, and standard VCF files. The parser auto-detects format, encoding, and genome build (GRCh37/GRCh38).

02

Normalisation & quality control

Genotypes are standardised to canonical form, no-call variants removed, duplicates deduplicated by rsID, and the full set sorted by chromosome and position. Typical output: ~700,000–800,000 clean variants.

03

Concurrent annotation

Each variant is queried against multiple databases simultaneously using asynchronous I/O. ClinVar for clinical significance, PharmGKB for drug-gene interactions, GWAS Catalog for trait associations, gnomAD for population frequencies, MyVariant.info for aggregated pathogenicity scores (CADD, SIFT, PolyPhen-2).

04

Polygenic risk scoring

We calculate weighted polygenic risk scores (PRS) for multiple conditions: Type 2 Diabetes, Coronary Artery Disease, Alzheimer’s Disease, Breast Cancer, and BMI/Obesity. Each score is normalised to a population-calibrated percentile (0–100) and categorised as low, average, elevated, or high risk.

05

Pharmacogenomic profiling

Your metaboliser status is determined for clinically actionable genes: CYP2D6, CYP2C19, CYP1A2, VKORC1, SLCO1B1, MTHFR, and more. Each result maps to specific medications with FDA/EMA-recognised pharmacogenomic labels and PharmGKB evidence levels (1A, 2A).

06

AI-powered report generation

A frontier language model synthesises all annotation data into a structured, plain-language report. Findings are prioritised by clinical actionability. Every claim is traceable to a source database. The AI is instructed to never overstate, never diagnose, and always recommend professional consultation.

2. Clinical databases

We do not invent our own science. Every annotation in a DeepDNA report comes from established, peer-reviewed, and publicly auditable databases maintained by leading research institutions worldwide.

ClinVar

NIH/NCBI archive of genomic variant–condition relationships. Clinical significance, review status, associated conditions.

PharmGKB

Stanford University pharmacogenomics knowledge base. Drug-gene interactions, dosing guidelines, metaboliser phenotypes.

GWAS Catalog

EMBL-EBI/NHGRI catalogue of genome-wide association studies. Trait associations, effect sizes, p-values from published research.

gnomAD

Broad Institute genome aggregation database. Population allele frequencies across 8+ ancestry groups from 200,000+ individuals.

MyVariant.info

Scripps Research aggregated variant annotation. CADD pathogenicity, SIFT/PolyPhen functional predictions, COSMIC somatic data.

dbSNP

NCBI database of single nucleotide polymorphisms. Reference SNP identifiers and genomic coordinates for all known human variants.

3. AI models & providers

Genomics is not a single-model problem. Different tasks — variant interpretation, protein structure prediction, literature synthesis, risk communication — demand different architectures. DeepDNA integrates multiple frontier AI systems, each deployed where its strengths matter most.

Claude

Anthropic

Primary reasoning engine for report generation, variant interpretation, and interactive genomic chat. Constitutional AI ensures outputs never overstate clinical significance or make diagnostic claims.

Live Report generation Chat

AlphaFold / AlphaFold 3

Google DeepMind

Protein structure prediction for missense variants. When your DNA carries a variant that changes an amino acid, AlphaFold predicts the structural impact on the resulting protein — is the fold disrupted? Is the active site affected?

Integrating Structure prediction Missense analysis

Gemini

Google DeepMind

Multi-modal reasoning for cross-referencing genomic findings with published medical literature, clinical imaging context, and structured biomarker data.

Integrating Literature synthesis Multi-modal

ESM-2 / ESMFold

Meta AI (FAIR)

Protein language model trained on 250 million sequences. Used for variant effect prediction at the protein level — predicting whether a mutation is likely tolerated or deleterious based on evolutionary conservation.

Integrating Variant effect Protein language

GPT-4o

OpenAI

Secondary reasoning model for multi-agent review. Used in our quality assurance pipeline where multiple AI perspectives cross-validate findings before they reach your report.

Integrating Quality assurance Multi-agent review

Llama 3

Meta AI

Open-weight model deployed on-premises for privacy-sensitive tasks. Processes intermediate annotation data that never leaves our European infrastructure. No external API call required.

Integrating On-premises EU-only

Mistral Large

Mistral AI · France

European-built frontier model. Used for GDPR-compliant text generation where data must remain within EU jurisdiction at every layer, including the model provider.

Integrating EU-native GDPR pipeline

PubMedBERT / BioGPT

Microsoft Research

Domain-specific biomedical language models fine-tuned on PubMed and PMC corpora. Used for extracting structured findings from genomic literature and mapping variants to published evidence.

Integrating Literature mining Evidence grading

We are model-agnostic by design. As new models emerge — from Anthropic, DeepMind, or the open-source community — we evaluate them against our genomic benchmarks and integrate the ones that improve outcomes. No single provider lock-in. No black box.

4. Beyond DNA: blood tests & biomarkers

Your genome tells you what could happen. Your blood tells you what is happening. The most powerful health intelligence comes from combining both.

Coming soon: Upload your blood test results alongside your DNA file. DeepDNA will correlate your genetic predispositions with your actual biomarker levels — turning static risk scores into dynamic, personalised health intelligence.

How it works

You upload a blood test report (PDF, photo, or structured data). Our system extracts biomarker values using multi-modal AI, then cross-references each value against your genetic profile:

Biomarker Genetic context Insight
LDL Cholesterol PCSK9, LDLR, APOB variants Distinguish lifestyle-driven vs. genetically-driven hyperlipidaemia. Flag familial hypercholesterolaemia risk.
HbA1c TCF7L2, SLC30A8, T2D PRS Correlate current glucose control with genetic diabetes susceptibility. Early warning system.
Vitamin D (25-OH) VDR, GC, CYP2R1 variants Explain why some people remain deficient despite supplementation. Personalise dosing.
Ferritin / Iron HFE C282Y, H63D variants Detect hereditary haemochromatosis carriers. Contextualise iron overload or deficiency.
TSH / T4 FOXE1, TPO, TSHR variants Genetic thyroid disease predisposition alongside current thyroid function.
Homocysteine MTHFR C677T, A1298C MTHFR status explains elevated homocysteine. Guides folate supplementation strategy.
CRP (hs-CRP) IL6, CRP, TNF-α variants Differentiate genetic inflammatory predisposition from acute/chronic inflammation.
Creatine Kinase SLCO1B1 rs4149056 If you carry the statin myopathy variant, elevated CK may indicate early muscle damage.
Lipid Panel APOE ε2/ε3/ε4 APOE genotype modulates lipid response to diet. ε4 carriers may need different strategies.
Complete Blood Count HBB, HBA1/HBA2 variants Identify thalassaemia or sickle cell trait carriers with unexplained microcytic anaemia.

Supported input formats

The correlation engine

This is not just two reports side by side. Our correlation engine performs gene–biomarker interaction analysis:

  1. Genetic risk contextualisation — A high PRS for Type 2 Diabetes combined with borderline HbA1c means something different than either alone. We calculate joint risk estimates.
  2. Pharmacogenomic monitoring — If you carry CYP2C19 poor metaboliser status and are on clopidogrel, we flag platelet function markers that your doctor should monitor.
  3. Nutrigenomic optimisation — MTHFR status + homocysteine levels = a precise, evidence-based folate supplementation recommendation, not a generic one.
  4. Longitudinal tracking — Upload blood tests over time. See how your biomarkers evolve relative to your genetic baseline. Detect drift before it becomes disease.

5. Validation & scientific rigour

We do not guess. Every finding in a DeepDNA report is graded and traceable.

Evidence grading

What we check

What we do not do

6. Privacy architecture

Your DNA is processed, not stored. The technical architecture enforces this:

Full details: Privacy Policy.

7. API access for developers & researchers

DeepDNA exposes a RESTful API for programmatic access to the analysis pipeline. Upload a genotype file, receive structured JSON with annotations, risk scores, and pharmacogenomic profiles. Designed for integration into clinical workflows, research pipelines, and third-party health platforms.

API endpoints:
POST /api/upload — Submit genotype file
POST /api/analyze/{id} — Run full analysis pipeline
GET  /api/report/{id} — Retrieve structured report (JSON)
POST /api/chat — Interactive genomic Q&A (SSE stream)
POST /api/biomarkers — Submit blood test data for correlation
GET  /api/correlation/{id} — Gene–biomarker correlation report

All API responses include full provenance: source database, evidence level, population frequency, and literature references for every annotation. Machine-readable by design.

8. What makes this different

Most consumer genomics platforms give you a PDF and call it a day. DeepDNA is a living intelligence layer on top of your genome:

Get Early Access →