System Architecture · v1.0 · Research Prototype

A virtual molecular
tumor board.

Give it a full oncology case — pathology, radiology, genomics, labs and clinical notes — and 55+ specialist agents deliberate in parallel to produce a fully-cited, guideline-concordant treatment recommendation with documented dissent.

Decision support, not decision making · 100% free / synthetic / de-identified data
56
Specialist agents
7
Orchestration layers
40+
Concurrent at peak
100%
Claims cited
High-level architecture

Six layers from raw data to a signed-off plan

A medallion data pipeline feeds a hybrid knowledge layer; a LangGraph-style supervisor orchestrates 55+ agents across seven tiers; a synthesis-and-safety round challenges the consensus before any human sees it.

① Data sources — 100% free
SyntheaMIMIC-IVTCGATCIACIViCClinVarDGIdbPharmGKBopenFDADailyMedRxNormPubMed/PMCNCI PDQClinicalTrials.gov
② Medallion pipeline
Bronze · raw
Silver · FHIR-normalised
Gold · Patient Case
③ Knowledge layer — hybrid retrieval
Vector store
Chunked literature, drug labels & guideline text · medical embeddings
Knowledge graph
gene→drug, drug↔drug, variant→therapy relationships
Provenance store: every claim → retrieved chunk → source document. The spine of the audit view.
④ Agent orchestration — 56 agents, 7 layers
L0
Orchestration
Board Chair decomposes the case and drives consensus
Orchestration1
L1
Intake & Normalisation
Classify, assemble, de-identify, build timeline
Intake5
L2
Diagnostic Specialists
Pathology · Radiology · Molecular · Labs (concurrent)
Pathology4Radiology6Molecular7+fan-outLabs3
L3
Therapeutic Specialists
Decision-making oncology + Pharmacy safety
Decision6Pharmacy5
L4
Evidence · Trials · Guidelines
Literature, trial matching, graded evidence (fan-out)
Trials4+fan-outEvidence5+fan-out
L5
Supportive & Holistic
Palliative · Nutrition · Psycho-oncology · Geriatric
Supportive4
L6
Synthesis · Safety · Reporting
Consensus, dissent, contradiction & citation checks
Synthesis6
Design principles

Why a risk-conscious clinical leader leans forward

Multimodal by construction

Imaging, genomics and text are first-class inputs fused into a shared case state — not bolted on.

Human-in-the-loop

The graph pauses at defined checkpoints for clinician sign-off. CADUCEUS never decides.

Provenance or it didn't happen

A Citation Validator rejects any recommendation lacking a traceable source. No source, no statement.

Adversarial self-check

A Dissent / Red-Team agent and a Contradiction Checker attack the consensus before a human sees it.

Privacy by design

Synthetic for the demo; real data de-identified and credentialed. A De-identification Verifier guards against PHI leakage.

Graded evidence

Recommendations carry levels of evidence (GRADE / NCI PDQ), not bare assertions.

Patient cases

Select a case to convene the board

Three synthetic / de-identified oncology presentations, each exercising a distinct reasoning path — oncogene-addicted targeted therapy, HER2 antibody therapy, and MSI-high immunotherapy.

Real patients · live

Or run the board on a real patient

These pull real, de-identified cancer patients from the NCI Genomic Data Commons (TCGA) open tier in real time — no API key, no credentialing. Variants are annotated live against CIViC, drug safety from openFDA, trials from ClinicalTrials.gov, literature from Europe PMC. Honest data gaps are marked, never fabricated.

NCI GDC / TCGA CIViC openFDA + FAERS ClinicalTrials.gov Europe PMC
The open tier gives real demographics, diagnosis, AJCC stage and somatic variants — but no free-text pathology/radiology reports, no ECOG, no PD-L1/TMB/MSI, and no labs or medications. CADUCEUS marks those as “not available” and never fabricates them. Live assembly hits several public APIs, so the first load can take a few seconds. For real ICU notes & chest X-rays (MIMIC), PhysioNet credentialing + a CITI course are required (~1–2 weeks) — out of scope for this instant demo.