Home Manifesto How It Works Blog Sample Report About Methodology Get Early Access
Our commitment: Every finding in a DeepDNA report is grounded in peer-reviewed research, validated against clinical-grade databases, and graded by evidence level. We cite our sources, explain our confidence, and never overstate what the science says.

1. Overview

DeepDNA is an AI-powered genomic analysis platform that interprets existing DNA genotype files. We do not perform laboratory sequencing. Instead, we accept raw genotype data files exported from consumer DNA testing providers (23andMe, AncestryDNA, MyHeritage, and others) and apply a multi-layered analysis pipeline to generate a comprehensive health, pharmacogenomics, nutrigenomics, and ancestry report.

Our methodology is built on three principles:

2. Input data and file processing

Supported file formats

DeepDNA accepts raw genotype data files in the following formats:

File validation

Before analysis begins, each uploaded file undergoes automated validation:

Format detection

The system identifies the file format and source provider automatically. Headers, delimiters, and encoding are verified against known schemas for each provider.

Integrity checks

The file is checked for completeness, duplicate entries, and data consistency. Minimum variant count thresholds ensure the file contains sufficient data for meaningful analysis.

Genome build alignment

Variant positions are mapped to the GRCh37/hg19 reference genome assembly using coordinate liftover where necessary. This ensures consistent annotation regardless of the original provider's build version.

Quality scoring

A file quality score is computed based on call rate, variant coverage across key genomic regions, and concordance with expected population allele frequencies. Files below quality thresholds are flagged for the user.

3. Variant interpretation pipeline

The core of DeepDNA's analysis is a multi-layered variant interpretation pipeline that processes each genetic variant through several stages of annotation, classification, and synthesis.

Stage 1: Variant annotation

Each variant (SNP) in the genotype file is annotated with:

Stage 2: Clinical significance lookup

Annotated variants are cross-referenced against multiple curated databases to determine clinical relevance:

Database Maintained by Content Update frequency
ClinVar NCBI / NIH (United States) Variant-disease associations submitted by clinical laboratories, research groups, and expert panels worldwide. Over 2.5 million submissions covering pathogenic, benign, and uncertain significance classifications. Weekly
PharmGKB Stanford University Pharmacogenomic variant annotations: how genetic variants affect drug metabolism, efficacy, and adverse reactions. Includes CPIC (Clinical Pharmacogenetics Implementation Consortium) guidelines. Continuous
gnomAD Broad Institute / MIT Population allele frequencies from over 807,162 individuals across multiple ancestries. Essential for distinguishing rare pathogenic variants from common benign polymorphisms. Major releases
OMIM Johns Hopkins University Comprehensive compendium of human genes and genetic phenotypes. Mendelian disorder associations, gene-phenotype relationships. Daily
GWAS Catalog NHGRI-EBI Curated collection of published genome-wide association studies. Variant-trait associations with effect sizes and p-values from studies meeting strict quality thresholds. Weekly
dbSNP NCBI / NIH Reference database for single nucleotide polymorphisms. Provides standardised rsID identifiers and variant descriptions. Continuous
UniProt EBI / SIB / PIR Protein sequence and functional information. Used for assessing the impact of missense variants on protein function. Monthly

Stage 3: Pharmacogenomic profiling

For pharmacogenomics analysis, DeepDNA follows established clinical guidelines:

Clinical note: DeepDNA pharmacogenomic results are for informational purposes only. Genotype arrays do not capture all pharmacogenomic variants (e.g., CYP2D6 gene deletions or duplications may not be detectable from microarray data). Always consult a healthcare provider before making medication changes.

Stage 4: Polygenic risk score computation

For multifactorial conditions (type 2 diabetes, coronary artery disease, breast cancer, Alzheimer's disease, and others), DeepDNA computes polygenic risk scores (PRS):

Stage 5: Nutrigenomic analysis

Nutrigenomic findings are derived from variants in genes affecting nutrient metabolism, absorption, and utilisation:

Each nutrigenomic finding is linked to its source publication, with practical dietary recommendations based on the genotype result.

Stage 6: Ancestry inference

Ancestry analysis uses a principal component analysis (PCA) approach applied to ancestry-informative markers (AIMs):

4. AI model architecture

DeepDNA's AI layer operates on top of the deterministic variant interpretation pipeline. The AI components serve two functions:

Variant effect prediction

For variants of uncertain significance (VUS) — those not yet classified in ClinVar or other curated databases — our AI models predict functional impact using:

Transparency principle: When a finding is based on AI prediction rather than established clinical classification, it is always clearly labelled in the report as "AI-predicted" with an associated confidence score. We never present AI predictions as equivalent to clinically validated findings.

Report synthesis and natural language generation

The second AI layer synthesises the structured variant data into readable, personalised report narratives:

5. Evidence grading system

Every finding in a DeepDNA report is assigned an evidence grade that reflects the strength of the underlying scientific evidence:

Grade Label Criteria
A Established Supported by multiple large-scale studies, meta-analyses, or clinical guidelines (CPIC Level A, ClinVar pathogenic with expert review). Actionable with high confidence.
B Strong evidence Supported by replicated GWAS findings (genome-wide significance in independent cohorts), CPIC Level B guidelines, or ClinVar pathogenic/likely pathogenic with multiple submitters.
C Moderate evidence Supported by published studies with consistent findings but limited replication, single large GWAS, or ClinVar likely pathogenic with single submitter.
D Preliminary Based on early-stage research, small sample sizes, or AI-predicted functional impact. Included for informational context but should not drive health decisions without further validation.

The evidence grade is displayed alongside every finding in the report. We encourage users to share Grade A and B findings with their healthcare providers and to treat Grade D findings as exploratory.

6. Quality standards and validation

Internal validation

Our variant interpretation pipeline is validated through:

Database currency

Clinical databases are updated on a regular schedule:

Limitations and known constraints

We are transparent about the limitations of our approach:

7. Ethical standards

8. References and further reading

Key publications and resources underpinning our methodology:

9. Contact

For scientific questions, methodology feedback, or requests for additional technical documentation:

For general enquiries:

See also: About DeepDNA · Privacy Policy · Terms of Service · How It Works