🕵 Agent Red-Teaming Leaderboard

Completed 3 months ago

All $171,800 awarded

Push the limits of direct and indirect attacks on AI agents.

Last updated 5 minutes ago

Models ranked by Attack Success Rate.

Ranking
Model
Total Breaks
Total Chats
Attack Success Rate
1. anthropic/claude-3.7-sonnet:thinking 1,636112,5001.45%
2. anthropic/claude-3.7-sonnet 1,941120,5621.61%
3. anthropic/claude-3.5-sonnet 1,83799,3411.85%
4. openai/gpt-4o 2,433100,4642.42%
5. openai/o3-2025-04-16 36014,6912.45%
6. anthropic/claude-3.5-haiku-20241022 2,28092,9962.45%
7. openai/o1 2,08281,4812.56%
8. openai/gpt-4.5-preview 2,46195,2952.58%
9. Model Spica 3,189106,4443.00%
10. Model Arcturus 3,490105,2153.32%
11. cohere/command-r-08-2024 3,889104,2913.73%
12. Model Pollux 3,11082,6103.76%
13. Model Andromeda 3,27380,9074.05%
14. Model Castor 3,39183,6664.05%
15. x-ai/grok-2-1212 3,49684,5924.13%
16. openai/o3-mini 3,08372,9214.23%
17. Model Fomalhaut 3,46979,3274.37%
18. Model Orion 1,65835,8844.62%
19. openai/o3-mini-high 3,03964,1614.74%
20. meta-llama/llama-3.1-405b-instruct 3,40357,8615.88%
21. mistralai/pixtral-large-2411 4,11165,9436.23%
22. meta-llama/llama-3.3-70b-instruct 5,00077,1346.48%