In short: You upload a DNA file. Our engine parses ~800,000 genetic variants, annotates each one against 7+ clinical databases in parallel, calculates polygenic risk scores, runs pharmacogenomic profiling, and generates a plain-language report using frontier AI models. The entire process runs on European servers. Your file is deleted after analysis. No data is retained.

1. The analysis pipeline

When you upload your genotype file, it passes through a six-stage pipeline. Each stage is independent and fault-tolerant — if one annotation source is temporarily unavailable, the rest continue. Nothing blocks.

Upload & format detection

We accept raw data files from 23andMe (v3–v5), AncestryDNA (v1/v2), MyHeritage, FamilyTreeDNA, and standard VCF files. The parser auto-detects format, encoding, and genome build (GRCh37/GRCh38).

Normalisation & quality control

Genotypes are standardised to canonical form, no-call variants removed, duplicates deduplicated by rsID, and the full set sorted by chromosome and position. Typical output: ~700,000–800,000 clean variants.

Concurrent annotation

Each variant is queried against multiple databases simultaneously using asynchronous I/O. ClinVar for clinical significance, PharmGKB for drug-gene interactions, GWAS Catalog for trait associations, gnomAD for population frequencies, MyVariant.info for aggregated pathogenicity scores (CADD, SIFT, PolyPhen-2).

Polygenic risk scoring

We calculate weighted polygenic risk scores (PRS) for multiple conditions: Type 2 Diabetes, Coronary Artery Disease, Alzheimer’s Disease, Breast Cancer, and BMI/Obesity. Each score is normalised to a population-calibrated percentile (0–100) and categorised as low, average, elevated, or high risk.

Pharmacogenomic profiling

Your metaboliser status is determined for clinically actionable genes: CYP2D6, CYP2C19, CYP1A2, VKORC1, SLCO1B1, MTHFR, and more. Each result maps to specific medications with FDA/EMA-recognised pharmacogenomic labels and PharmGKB evidence levels (1A, 2A).

AI-powered report generation

A frontier language model synthesises all annotation data into a structured, plain-language report. Findings are prioritised by clinical actionability. Every claim is traceable to a source database. The AI is instructed to never overstate, never diagnose, and always recommend professional consultation.

2. Clinical databases

We do not invent our own science. Every annotation in a DeepDNA report comes from established, peer-reviewed, and publicly auditable databases maintained by leading research institutions worldwide.

ClinVar

NIH/NCBI archive of genomic variant–condition relationships. Clinical significance, review status, associated conditions.

PharmGKB

Stanford University pharmacogenomics knowledge base. Drug-gene interactions, dosing guidelines, metaboliser phenotypes.

GWAS Catalog

EMBL-EBI/NHGRI catalogue of genome-wide association studies. Trait associations, effect sizes, p-values from published research.

gnomAD

Broad Institute genome aggregation database. Population allele frequencies across 8+ ancestry groups from 200,000+ individuals.

MyVariant.info

Scripps Research aggregated variant annotation. CADD pathogenicity, SIFT/PolyPhen functional predictions, COSMIC somatic data.

dbSNP

NCBI database of single nucleotide polymorphisms. Reference SNP identifiers and genomic coordinates for all known human variants.

3. AI models & providers

Genomics is not a single-model problem. Different tasks — variant interpretation, protein structure prediction, literature synthesis, risk communication — demand different architectures. DeepDNA integrates multiple frontier AI systems, each deployed where its strengths matter most.

Claude

Anthropic

Primary reasoning engine for report generation, variant interpretation, and interactive genomic chat. Constitutional AI ensures outputs never overstate clinical significance or make diagnostic claims.

Live Report generation Chat

AlphaFold / AlphaFold 3

Google DeepMind

Protein structure prediction for missense variants. When your DNA carries a variant that changes an amino acid, AlphaFold predicts the structural impact on the resulting protein — is the fold disrupted? Is the active site affected?

Integrating Structure prediction Missense analysis

Gemini

Google DeepMind

Multi-modal reasoning for cross-referencing genomic findings with published medical literature, clinical imaging context, and structured biomarker data.

Integrating Literature synthesis Multi-modal

ESM-2 / ESMFold

Meta AI (FAIR)

Protein language model trained on 250 million sequences. Used for variant effect prediction at the protein level — predicting whether a mutation is likely tolerated or deleterious based on evolutionary conservation.

Integrating Variant effect Protein language

GPT-4o

OpenAI

Secondary reasoning model for multi-agent review. Used in our quality assurance pipeline where multiple AI perspectives cross-validate findings before they reach your report.

Integrating Quality assurance Multi-agent review

Llama 3

Meta AI

Open-weight model deployed on-premises for privacy-sensitive tasks. Processes intermediate annotation data that never leaves our European infrastructure. No external API call required.

Integrating On-premises EU-only

Mistral Large

Mistral AI · France

European-built frontier model. Used for GDPR-compliant text generation where data must remain within EU jurisdiction at every layer, including the model provider.

Integrating EU-native GDPR pipeline

PubMedBERT / BioGPT

Microsoft Research

Domain-specific biomedical language models fine-tuned on PubMed and PMC corpora. Used for extracting structured findings from genomic literature and mapping variants to published evidence.

Integrating Literature mining Evidence grading

We are model-agnostic by design. As new models emerge — from Anthropic, DeepMind, or the open-source community — we evaluate them against our genomic benchmarks and integrate the ones that improve outcomes. No single provider lock-in. No black box.

4. Beyond DNA: blood tests & biomarkers

Your genome tells you what could happen. Your blood tells you what is happening. The most powerful health intelligence comes from combining both.

  Coming soon: Upload your blood test results alongside your DNA file. DeepDNA will correlate your genetic predispositions with your actual biomarker levels — turning static risk scores into dynamic, personalised health intelligence.

How it works

You upload a blood test report (PDF, photo, or structured data). Our system extracts biomarker values using multi-modal AI, then cross-references each value against your genetic profile:

Biomarker	Genetic context	Insight
LDL Cholesterol	PCSK9, LDLR, APOB variants	Distinguish lifestyle-driven vs. genetically-driven hyperlipidaemia. Flag familial hypercholesterolaemia risk.
HbA1c	TCF7L2, SLC30A8, T2D PRS	Correlate current glucose control with genetic diabetes susceptibility. Early warning system.
Vitamin D (25-OH)	VDR, GC, CYP2R1 variants	Explain why some people remain deficient despite supplementation. Personalise dosing.
Ferritin / Iron	HFE C282Y, H63D variants	Detect hereditary haemochromatosis carriers. Contextualise iron overload or deficiency.
TSH / T4	FOXE1, TPO, TSHR variants	Genetic thyroid disease predisposition alongside current thyroid function.
Homocysteine	MTHFR C677T, A1298C	MTHFR status explains elevated homocysteine. Guides folate supplementation strategy.
CRP (hs-CRP)	IL6, CRP, TNF-α variants	Differentiate genetic inflammatory predisposition from acute/chronic inflammation.
Creatine Kinase	SLCO1B1 rs4149056	If you carry the statin myopathy variant, elevated CK may indicate early muscle damage.
Lipid Panel	APOE ε2/ε3/ε4	APOE genotype modulates lipid response to diet. ε4 carriers may need different strategies.
Complete Blood Count	HBB, HBA1/HBA2 variants	Identify thalassaemia or sickle cell trait carriers with unexplained microcytic anaemia.

Supported input formats

PDF reports from any laboratory (multi-modal AI extraction)
Photos of printed results (camera or scan)
HL7 FHIR structured health data (direct integration with clinical systems)
Manual entry of individual biomarker values
Wearable sync — continuous biomarker data from devices (planned)

The correlation engine

This is not just two reports side by side. Our correlation engine performs gene–biomarker interaction analysis:

Genetic risk contextualisation — A high PRS for Type 2 Diabetes combined with borderline HbA1c means something different than either alone. We calculate joint risk estimates.
Pharmacogenomic monitoring — If you carry CYP2C19 poor metaboliser status and are on clopidogrel, we flag platelet function markers that your doctor should monitor.
Nutrigenomic optimisation — MTHFR status + homocysteine levels = a precise, evidence-based folate supplementation recommendation, not a generic one.
Longitudinal tracking — Upload blood tests over time. See how your biomarkers evolve relative to your genetic baseline. Detect drift before it becomes disease.

5. Validation & scientific rigour

We do not guess. Every finding in a DeepDNA report is graded and traceable.

Evidence grading

Level 1A — Pharmacogenomic associations with FDA/EMA-approved drug labels and CPIC guidelines
Level 2A — Associations replicated in multiple independent studies with significant effect sizes
Informational — Associations from single GWAS or preliminary research, clearly labelled as early-stage evidence

What we check

Every variant annotation is cross-referenced against at least two independent databases
Polygenic risk scores use published, validated SNP weight sets from peer-reviewed studies
AI-generated explanations are constrained by system-level instructions that prohibit diagnostic claims, require hedging language for uncertain findings, and mandate professional consultation recommendations
Pathogenicity predictions (CADD, SIFT, PolyPhen-2) use established computational thresholds, not custom-trained models

What we do not do

We do not diagnose disease. DeepDNA reports are educational and informational.
We do not replace genetic counsellors or physicians.
We do not claim clinical-grade accuracy for consumer genotyping data (which has inherent limitations vs. whole-genome sequencing).
We do not retain your data after analysis. There is no "DeepDNA database" of genomes.

6. Privacy architecture

Your DNA is processed, not stored. The technical architecture enforces this:

Processing servers: Hetzner, Helsinki, Finland. EU jurisdiction.
Encryption: TLS 1.3 in transit, AES-256 at rest during processing.
Data lifecycle: Upload → parse → annotate → generate report → delete source file.
No third-party sharing: Annotation queries use rsIDs (variant identifiers), never your full genotype file. No external service sees your complete genetic profile.
Zero cookies, zero trackers: Our website sets no cookies and uses no analytics trackers.
AI provider isolation: The AI model receives a summarised report for explanation, not raw variant data. Your genome never reaches the language model in full.

Full details: Privacy Policy.

7. API access for developers & researchers

DeepDNA exposes a RESTful API for programmatic access to the analysis pipeline. Upload a genotype file, receive structured JSON with annotations, risk scores, and pharmacogenomic profiles. Designed for integration into clinical workflows, research pipelines, and third-party health platforms.

  API endpoints:

  POST /api/upload — Submit genotype file

  POST /api/analyze/{id} — Run full analysis pipeline

  GET  /api/report/{id} — Retrieve structured report (JSON)

  POST /api/chat — Interactive genomic Q&A (SSE stream)

  POST /api/biomarkers — Submit blood test data for correlation

  GET  /api/correlation/{id} — Gene–biomarker correlation report

All API responses include full provenance: source database, evidence level, population frequency, and literature references for every annotation. Machine-readable by design.

8. What makes this different

Most consumer genomics platforms give you a PDF and call it a day. DeepDNA is a living intelligence layer on top of your genome:

Multi-model AI — Not one model, but an ensemble of frontier systems from Anthropic, Google DeepMind, Meta AI, Mistral, and the open-source community. Each model handles what it does best.
Real-time databases — Your report draws from live, continuously updated sources. When ClinVar reclassifies a variant, your analysis reflects it.
DNA + blood integration — The first platform to systematically correlate genetic predisposition with actual biomarker data. Genotype meets phenotype.
Interactive chat — Ask questions about your results in natural language. The AI has full context of your report and can explain any finding in depth.
No data retention — We process and delete. Your genome is not our business model.
European sovereignty — Built in Europe, hosted in Europe, governed by European law. Your genetic data never leaves GDPR jurisdiction.
Open API — Researchers and developers can integrate the full pipeline into their own systems.

Get Early Access →