๐Ÿ•ต Agent Red-Teaming Leaderboard

โœ… Completed 6 months ago All $171,800 awarded

Push the limits of direct and indirect attacks on AI agents.

Last updated 6 minutes ago

Models ranked by Attack Success Rate.

Ranking
Model
Total Breaks
Total Chats
Attack Success Rate
1. anthropic/claude-3.7-sonnet:thinking 1,636112,5131.45%
2. anthropic/claude-3.7-sonnet 1,942120,6391.61%
3. anthropic/claude-3.5-sonnet 1,83899,4011.85%
4. openai/gpt-4o 2,434100,5272.42%
5. anthropic/claude-3.5-haiku-20241022 2,28193,0692.45%
6. openai/o3-2025-04-16 36114,7242.45%
7. openai/o1 2,08281,6102.55%
8. openai/gpt-4.5-preview 2,46195,3012.58%
9. Model Spica 3,191106,8092.99%
10. Model Arcturus 3,491105,2413.32%
11. cohere/command-r-08-2024 3,891104,3263.73%
12. Model Pollux 3,11082,6783.76%
13. Model Andromeda 3,27680,9434.05%
14. Model Castor 3,39383,6934.05%
15. x-ai/grok-2-1212 3,49784,6574.13%
16. openai/o3-mini 3,08372,9324.23%
17. Model Fomalhaut 3,47179,3874.37%
18. Model Orion 1,65835,8854.62%
19. openai/o3-mini-high 3,03964,1804.74%
20. meta-llama/llama-3.1-405b-instruct 3,40357,9045.88%
21. mistralai/pixtral-large-2411 4,11265,9826.23%
22. meta-llama/llama-3.3-70b-instruct 5,00077,1456.48%