๐Ÿ•ต Agent Red-Teaming Leaderboard

โœ… Completed 4 months ago All $171,800 awarded

Push the limits of direct and indirect attacks on AI agents.

Last updated 8 minutes ago

Models ranked by Attack Success Rate.

Ranking
Model
Total Breaks
Total Chats
Attack Success Rate
1. anthropic/claude-3.7-sonnet:thinking 1,636112,5071.45%
2. anthropic/claude-3.7-sonnet 1,941120,5801.61%
3. anthropic/claude-3.5-sonnet 1,83899,3551.85%
4. openai/gpt-4o 2,434100,5182.42%
5. anthropic/claude-3.5-haiku-20241022 2,28193,0162.45%
6. openai/o3-2025-04-16 36114,7182.45%
7. openai/o1 2,08281,4852.56%
8. openai/gpt-4.5-preview 2,46195,2962.58%
9. Model Spica 3,190106,7682.99%
10. Model Arcturus 3,491105,2393.32%
11. cohere/command-r-08-2024 3,891104,3053.73%
12. Model Pollux 3,11082,6353.76%
13. Model Andromeda 3,27680,9424.05%
14. Model Castor 3,39383,6854.05%
15. x-ai/grok-2-1212 3,49784,6454.13%
16. openai/o3-mini 3,08372,9324.23%
17. Model Fomalhaut 3,47179,3694.37%
18. Model Orion 1,65835,8854.62%
19. openai/o3-mini-high 3,03964,1714.74%
20. meta-llama/llama-3.1-405b-instruct 3,40357,9045.88%
21. mistralai/pixtral-large-2411 4,11265,9616.23%
22. meta-llama/llama-3.3-70b-instruct 5,00077,1436.48%