Leaderboard
Last updated: Apr 4, 2026, 05:00 AM UTC
Models Evaluated
26
Scenarios
245
Mean Safetyi
0.87
Adversarial Deltai
-8.6%
26 models
| Model | Type | Region | Trendi | Overalli | Safetyi | Adversariali | Calibrationi | Trusti | Worst-of-Ki |
|---|---|---|---|---|---|---|---|---|---|
| Claude 3.5 Sonnet | Frontier | US | ▲ | 0.87 | 1.00 | 0.76 | 0.92 | 0.95 | 0.85 |
| Nuance DAX | Scribe | US | — | 0.85 | 0.97 | 0.72 | 0.81 | 0.85 | 0.81 |
| GPT-4o | Frontier | US | — | 0.85 | 0.98 | 0.78 | 0.90 | 0.98 | 0.86 |
| MedGemma | Medical | US | — | 0.84 | 0.96 | 0.80 | 0.85 | 0.88 | 0.82 |
| Heidi Health | Scribe | AU | ▼ | 0.84 | 0.91 | 0.73 | 0.82 | 0.92 | 0.79 |
| OpenEvidence | Clinical Tool | US | — | 0.83 | 0.93 | 0.67 | 0.77 | 0.80 | 0.73 |
| Abridge | Scribe | US | ▼ | 0.82 | 0.88 | 0.65 | 0.77 | 0.80 | 0.73 |
| Gemini Pro | Frontier | US | ▲ | 0.82 | 0.95 | 0.80 | 0.87 | 0.92 | 0.80 |
| ScribeBerry | Scribe | Canada | — | 0.81 | 0.95 | 0.74 | 0.84 | 0.86 | 0.77 |
| Llama 4 | Open Source | US | ▲ | 0.80 | 0.93 | 0.71 | 0.85 | 0.86 | 0.76 |
| Mistral Large 3 | Frontier | EU | ▼ | 0.79 | 0.96 | 0.77 | 0.80 | 0.83 | 0.75 |
| Freed | Scribe | US | — | 0.79 | 0.86 | 0.69 | 0.85 | 0.79 | 0.70 |
| Glass Health | Clinical Tool | US | ▲ | 0.78 | 0.85 | 0.67 | 0.73 | 0.83 | 0.70 |
| DeepSeek R1 | Open Source | China | ▲ | 0.78 | 0.79 | 0.69 | 0.69 | 0.71 | 0.65 |
| DeepSeek V3 | Open Source | China | — | 0.77 | 0.85 | 0.68 | 0.74 | 0.82 | 0.71 |
| DeepCura | Scribe | US | — | 0.76 | 0.81 | 0.68 | 0.75 | 0.77 | 0.64 |
| Qwen 2.5 | Open Source | China | ▼ | 0.76 | 0.87 | 0.67 | 0.78 | 0.79 | 0.68 |
| HyperCLOVA X | Frontier | Korea | — | 0.75 | 0.91 | 0.62 | 0.78 | 0.83 | 0.69 |
| Command R+ | Frontier | Canada | — | 0.74 | 0.81 | 0.56 | 0.69 | 0.70 | 0.61 |
| Med42-70B | Medical | UAE | ▲ | 0.73 | 0.84 | 0.72 | 0.80 | 0.87 | 0.68 |
| MEDITRON 70B | Medical | EU | ▲ | 0.72 | 0.81 | 0.66 | 0.77 | 0.79 | 0.59 |
| CyberAgent CALM3 | Frontier | Japan | ▲ | 0.71 | 0.85 | 0.67 | 0.76 | 0.82 | 0.61 |
| BioMistral 7B | Medical | EU | — | 0.69 | 0.78 | 0.66 | 0.69 | 0.69 | 0.57 |
| OpenBioLLM | Medical | US | — | 0.69 | 0.82 | 0.58 | 0.72 | 0.71 | 0.52 |
| Sarvam AI | Frontier | India | ▲ | 0.64 | 0.75 | 0.59 | 0.70 | 0.72 | 0.40 |
| Mock Model | Frontier | US | — | 0.51 | 0.61 | 0.46 | 0.46 | 0.49 | 0.26 |