Leaderboard
Last updated: Apr 4, 2026, 05:00 AM UTC
Models Evaluated
26
Scenarios
245
Mean Safetyi
0.87
Adversarial Deltai
-8.6%
26 models
| Model | Type | Region | Trendi | Overalli | Safetyi | Adversariali | Calibrationi | Trusti | Worst-of-Ki | Updated |
|---|---|---|---|---|---|---|---|---|---|---|
| Claude 3.5 SonnetMODEL | Frontier | US | ▲ | 0.87 | 1.00 | 0.76 | 0.92 | 0.95 | 0.85 | — |
| Nuance DAXMODEL | Ambient Scribe | US | — | 0.85 | 0.97 | 0.72 | 0.81 | 0.85 | 0.81 | — |
| GPT-4oMODEL | Frontier | US | — | 0.85 | 0.98 | 0.78 | 0.90 | 0.98 | 0.86 | — |
| MedGemmaMODEL | Medical Specialist | US | — | 0.84 | 0.96 | 0.80 | 0.85 | 0.88 | 0.82 | — |
| Heidi HealthMODEL | Ambient Scribe | AU | ▼ | 0.84 | 0.91 | 0.73 | 0.82 | 0.92 | 0.79 | — |
| OpenEvidenceMODEL | CDS | US | — | 0.83 | 0.93 | 0.67 | 0.77 | 0.80 | 0.73 | — |
| AbridgeMODEL | Ambient Scribe | US | ▼ | 0.82 | 0.88 | 0.65 | 0.77 | 0.80 | 0.73 | — |
| Gemini ProMODEL | Frontier | US | ▲ | 0.82 | 0.95 | 0.80 | 0.87 | 0.92 | 0.80 | — |
| ScribeBerryMODEL | Ambient Scribe | Canada | — | 0.81 | 0.95 | 0.74 | 0.84 | 0.86 | 0.77 | — |
| Llama 4MODEL | Open Source | US | ▲ | 0.80 | 0.93 | 0.71 | 0.85 | 0.86 | 0.76 | — |
| Mistral Large 3MODEL | Frontier | EU | ▼ | 0.79 | 0.96 | 0.77 | 0.80 | 0.83 | 0.75 | — |
| FreedMODEL | Ambient Scribe | US | — | 0.79 | 0.86 | 0.69 | 0.85 | 0.79 | 0.70 | — |
| Glass HealthMODEL | CDS | US | ▲ | 0.78 | 0.85 | 0.67 | 0.73 | 0.83 | 0.70 | — |
| DeepSeek R1MODEL | Open Source | China | ▲ | 0.78 | 0.79 | 0.69 | 0.69 | 0.71 | 0.65 | — |
| DeepSeek V3MODEL | Open Source | China | — | 0.77 | 0.85 | 0.68 | 0.74 | 0.82 | 0.71 | — |
| DeepCuraMODEL | Ambient Scribe | US | — | 0.76 | 0.81 | 0.68 | 0.75 | 0.77 | 0.64 | — |
| Qwen 2.5MODEL | Open Source | China | ▼ | 0.76 | 0.87 | 0.67 | 0.78 | 0.79 | 0.68 | — |
| HyperCLOVA XMODEL | Frontier | Korea | — | 0.75 | 0.91 | 0.62 | 0.78 | 0.83 | 0.69 | — |
| Command R+MODEL | Frontier | Canada | — | 0.74 | 0.81 | 0.56 | 0.69 | 0.70 | 0.61 | — |
| Med42-70BMODEL | Medical Specialist | UAE | ▲ | 0.73 | 0.84 | 0.72 | 0.80 | 0.87 | 0.68 | — |
| MEDITRON 70BMODEL | Medical Specialist | EU | ▲ | 0.72 | 0.81 | 0.66 | 0.77 | 0.79 | 0.59 | — |
| CyberAgent CALM3MODEL | Frontier | Japan | ▲ | 0.71 | 0.85 | 0.67 | 0.76 | 0.82 | 0.61 | — |
| BioMistral 7BMODEL | Medical Specialist | EU | — | 0.69 | 0.78 | 0.66 | 0.69 | 0.69 | 0.57 | — |
| OpenBioLLMMODEL | Medical Specialist | US | — | 0.69 | 0.82 | 0.58 | 0.72 | 0.71 | 0.52 | — |
| Sarvam AIMODEL | Frontier | India | ▲ | 0.64 | 0.75 | 0.59 | 0.70 | 0.72 | 0.40 | — |
| Mock ModelMODEL | Frontier | US | — | 0.51 | 0.61 | 0.46 | 0.46 | 0.49 | 0.26 | — |