Habeas Protocol — Empirical Coding of DIFC + ADGM + SICC Judgments (v0.2, n=188)

Start here

What this is, in plain words

When two businesses in different countries fall out — a Dubai fintech and a UK supplier, an Abu Dhabi DAO and a Singapore exchange — they take it to a specialist commercial court: DIFC (Dubai), ADGM (Abu Dhabi), or SICC (Singapore). Those courts publish rules and judgments, but only as PDFs. There is no way for software to ask "what would this court do with this case?" This dashboard is a working answer.

188 real judgments are scored against six properties any digital tribunal must satisfy. Twelve real legal rules from those judgments are encoded in Catala, an open-source language for law, with pure-Python reference evaluators and conformance tests cross-checking the two. Seven worked examples reproduce the courts' arithmetic to the cent (three of them flag clerical or methodological gaps the courts published, surfaced as structured records). A 30-instrument falsification set across non-court instruments (sealed awards, on-chain DAOs, regulator notices, platform adjudicators, UDRP panels) shows the rubric measures procedural form, not pedigree. You can run the rules yourself, draft new ones, and route cross-border cases — no legalese required. The bet is the opposite of most "Web3 court" proposals: the courts already exist; what's missing is the computational layer.

Atlas (below)

See the 188 fingerprints ↓

One sigil per judgment, generated from its primitive scores. Same scores → same shape. Click any cell for the full record.

Playground

Run a rule →

Pick one of 12 encoded rules, fill in a fact pattern, see the verdict and the trace of why it reached that conclusion.

Simulator

Run a full dispute →

Pick a claim type and tribunal. Every applicable rule fires against the same facts; the simulator aggregates a predicted disposition.

Cross-border / API / more

Route a multi-jurisdiction dispute →

Plus authoring, ingestion, and an 18-endpoint API with first-party Python + TypeScript clients.

At a glance

—

Judgments coded

—

First-pass set (LLM-graded)

—

DIFC

—

ADGM

—

SICC

—

SICC mean / 2.00

—

ADGM mean / 2.00

—

DIFC mean / 2.00

—

Working traces

Empirical findings

All three tribunals score at near-ceiling on every per-ruling primitive, and 2/2 on both architectural system properties. The pattern is robust to two stress tests, and seven executable traces from the corpus demonstrate the protocol's methodological coverage across formula, deferred conditional, bounded discretion, arithmetic and Boolean composition, statutory partial-refusal, and a third-party-jurisdiction gate.

Inferential layer. Bootstrap 95% coder-resampling intervals (10000 resamples; describe coding-procedure variance over the n=188 corpus, not population variance): ADGM 1.91 [1.89, 1.94], SICC 1.85 [1.80, 1.90], DIFC 1.72 [1.62, 1.81]. All three pairwise differences exclude zero at α=0.05 — the ranking is statistically supported, not point-estimate noise. Raw values: data/bootstrap_ci.json; computation: scripts/compute_bootstrap_ci.py.

Construct validity (external correlate, ran 2026-05-07). Per-judgment v0.2 mean correlates with appeal status at Spearman ρ = +0.32 and with subsequent-citation count at ρ = +0.12 across n=186 (data/robustness/external_correlate.json). The pre-registered H8 stop rule (|ρ| ≥ 0.10 in predicted direction on any of three external metrics) passes. Higher-scoring judgments are more likely to be referenced by appellate-court output.

Of the 188 judgments scored, 39 form an LLM-graded first-pass set (32 DIFC + 7 ADGM, scored by Claude Sonnet 4.5); the remaining 149 entries (69 ADGM + 80 SICC) are scored by deterministic regex heuristics — no LLM in the loop for these 149. Per-entry grader-type and provenance are recorded in coding.grader_type, coding.coder, and grader-type-specific fields (model + prompt SHA for LLM entries; producing-script path for regex entries).

Grader-type stability (ADGM, the only tribunal with both grader types). The LLM grader (n=7) scores ADGM at 1.93; the regex heuristic-triage (n=16) scores 1.93; the regex heuristic-graded (n=53) scores 1.91. The two graders agree to within 0.02 on the overall mean — the within-corpus evidence that the saturation finding is a property of the tribunal rather than the grading instrument.
SICC PR4 heuristic limitation, corrected. The regex grader produces PR4 = 1.55 for SICC because the four-marker triplet test fails on narrative grounds-of-decision documents. The corrected PR4 (Claude re-grades PR4 only with a prompt explicitly instructed to read narrative form) is what enters the headline SICC mean of 1.85. The regex result is preserved as the known-flawed measurement.
Falsification cross-check. A 30-instrument falsification set across five non-court instrument classes (sealed awards, on-chain DAOs, regulator notices, platform adjudicators, UDRP panels) confirms the rubric separates real commercial courts from non-courts cleanly and does NOT mark down a positive control (UDRP, gap +0.05). The rubric measures procedural form, not pedigree.
Cross-family replication. The protocol crosses legal-family boundaries — Singapore common law via the IAA, vs DIFC's own statutes and ADGM's English-law-via-statute — and translates to a civil-law foil under the peer-court comparison set. The protocol is not court-specific.
Architectural system properties. All three tribunals score 2/2 on separation of powers (SP1) and enforceability under the New York Convention (SP2). These are the structural pre-conditions for plugging software into the bench.
Methodological coverage. Seven executable traces from the corpus show the protocol covers (i) static-rule arithmetic, (ii) deferred conditionals, (iii) rule-bounded human judgment, (iv) arithmetic composition over substantive findings, (v) Boolean composition over contractual interpretation, (vi) NY-Convention partial refusal under Singapore IAA s 31, and (vii) a third-party-jurisdiction gate under Norwich Pharmacal + Bankers Trust + RDC 28.52.

The tribunals already exist; what is missing is the computational layer.

The Atlas — 188 fingerprints

One sigil per judgment. Each fingerprint below is generated deterministically from the primitive scores of a single ruling in the coded corpus. Same scores → same shape; different scores → different shape. Six concentric rings encode PR1–PR6, two outer arcs encode SP1–SP2, and a hash-seeded rosette gives every case ID its own face.

Habeas Atlas · vol. I

Read the rings, and you can read the court.

DIFC Courts · 32 judgments

ADGM Courts · 76 judgments

SICC · 80 judgments

loading judgments…

How to read a fingerprint

Six concentric rings encode the per-ruling primitives, innermost to outermost: PR1 rule source · PR2 typed evidence · PR3 machine-readable order · PR4 procedural state · PR5 reasoning trace · PR6 replayability.

A full ring with eight tick-marks means a perfect score (2). A half-arc with four ticks means a partial score (1). A faint dashed circle means absent (0).

Two outer arcs encode the system properties: SP1 separation of powers (top), SP2 appeal path (bottom). The central rosette is a hash-seeded ornament unique to the case ID — so two judgments with identical scores still wear different faces. A small dot at the upper-right marks one of the seven cases that became an executable trace.

loading…

LOADING JUDGMENTS…

The six per-ruling primitives

Properties of any individual ruling, scored 0 (absent) / 1 (partial) / 2 (fully implemented). v0.2 of the framework. Definitions live in data/primitives.json.

ID	Name	What it tests

System properties

Architectural facts about the tribunal as a whole, not properties of individual rulings. Scored once per institution. The score-0 row is what you avoid by anchoring at DIFC or ADGM rather than at an ad-hoc Web3 arbitration project.

Tribunal	SP1 Separation of powers	SP2 Appeal path

Seven working traces

Each trace lifts a real rule from the corpus into Catala source plus a Python evaluator and runs it against the case's event log. The seven span the full methodological spectrum: formula, deferred conditional, bounded discretion, arithmetic composition, Boolean composition, partial statutory refusal under Singapore IAA s 31, and a third-party-jurisdiction gate (Norwich Pharmacal + Bankers Trust). Three tribunals, three legal families, one engine. Trace #3 is the honest one — it shows what the rules cannot fully decide.

Trace viewer

Pick a trace. The left column is the rule as Catala source — the formal specification. The middle column is the event log — what happened, with the facts the human judge had to determine. The right column is the output — what the predicate produces when run against those facts, with assertions checked against the court's ruling.

Rule rule.catala_en

Events events.json

Output predicate ⟶ court

All judgments

Case	Tribunal	Date	Judge	Mean score	PR1	PR2	PR3	PR4	PR5	PR6