🕵 Agent Red-Teaming Leaderboard
✅ Completed 3 months ago
All $171,800 awarded
Push the limits of direct and indirect attacks on AI agents.
Last updated 5 minutes ago
Models ranked by Attack Success Rate.
Ranking | Model | Total Breaks | Total Chats | Attack Success Rate |
---|---|---|---|---|
1. | anthropic/claude-3.7-sonnet:thinking | 1,636 | 112,500 | 1.45% |
2. | anthropic/claude-3.7-sonnet | 1,941 | 120,562 | 1.61% |
3. | anthropic/claude-3.5-sonnet | 1,837 | 99,341 | 1.85% |
4. | openai/gpt-4o | 2,433 | 100,464 | 2.42% |
5. | openai/o3-2025-04-16 | 360 | 14,691 | 2.45% |
6. | anthropic/claude-3.5-haiku-20241022 | 2,280 | 92,996 | 2.45% |
7. | openai/o1 | 2,082 | 81,481 | 2.56% |
8. | openai/gpt-4.5-preview | 2,461 | 95,295 | 2.58% |
9. | Model Spica | 3,189 | 106,444 | 3.00% |
10. | Model Arcturus | 3,490 | 105,215 | 3.32% |
11. | cohere/command-r-08-2024 | 3,889 | 104,291 | 3.73% |
12. | Model Pollux | 3,110 | 82,610 | 3.76% |
13. | Model Andromeda | 3,273 | 80,907 | 4.05% |
14. | Model Castor | 3,391 | 83,666 | 4.05% |
15. | x-ai/grok-2-1212 | 3,496 | 84,592 | 4.13% |
16. | openai/o3-mini | 3,083 | 72,921 | 4.23% |
17. | Model Fomalhaut | 3,469 | 79,327 | 4.37% |
18. | Model Orion | 1,658 | 35,884 | 4.62% |
19. | openai/o3-mini-high | 3,039 | 64,161 | 4.74% |
20. | meta-llama/llama-3.1-405b-instruct | 3,403 | 57,861 | 5.88% |
21. | mistralai/pixtral-large-2411 | 4,111 | 65,943 | 6.23% |
22. | meta-llama/llama-3.3-70b-instruct | 5,000 | 77,134 | 6.48% |
🕵 Agent Red-Teaming Leaderboard
✅ Completed 3 months ago
All $171,800 awarded
Push the limits of direct and indirect attacks on AI agents.
Last updated 5 minutes ago
Models ranked by Attack Success Rate.
Ranking | Model | Total Breaks | Total Chats | Attack Success Rate |
---|---|---|---|---|
1. | anthropic/claude-3.7-sonnet:thinking | 1,636 | 112,500 | 1.45% |
2. | anthropic/claude-3.7-sonnet | 1,941 | 120,562 | 1.61% |
3. | anthropic/claude-3.5-sonnet | 1,837 | 99,341 | 1.85% |
4. | openai/gpt-4o | 2,433 | 100,464 | 2.42% |
5. | openai/o3-2025-04-16 | 360 | 14,691 | 2.45% |
6. | anthropic/claude-3.5-haiku-20241022 | 2,280 | 92,996 | 2.45% |
7. | openai/o1 | 2,082 | 81,481 | 2.56% |
8. | openai/gpt-4.5-preview | 2,461 | 95,295 | 2.58% |
9. | Model Spica | 3,189 | 106,444 | 3.00% |
10. | Model Arcturus | 3,490 | 105,215 | 3.32% |
11. | cohere/command-r-08-2024 | 3,889 | 104,291 | 3.73% |
12. | Model Pollux | 3,110 | 82,610 | 3.76% |
13. | Model Andromeda | 3,273 | 80,907 | 4.05% |
14. | Model Castor | 3,391 | 83,666 | 4.05% |
15. | x-ai/grok-2-1212 | 3,496 | 84,592 | 4.13% |
16. | openai/o3-mini | 3,083 | 72,921 | 4.23% |
17. | Model Fomalhaut | 3,469 | 79,327 | 4.37% |
18. | Model Orion | 1,658 | 35,884 | 4.62% |
19. | openai/o3-mini-high | 3,039 | 64,161 | 4.74% |
20. | meta-llama/llama-3.1-405b-instruct | 3,403 | 57,861 | 5.88% |
21. | mistralai/pixtral-large-2411 | 4,111 | 65,943 | 6.23% |
22. | meta-llama/llama-3.3-70b-instruct | 5,000 | 77,134 | 6.48% |