💣 Single Turn Harmful Outputs Leaderboard
🏁 Started 10 months ago
$38,000 of $42,000 awarded
Attempt to break various large language models (LLMs) using a singular chat message.
Last updated a month ago
Models ranked by User Break Rate
Ranking | Model | Safety Violation Count | Total Requests | User Break Rate |
---|---|---|---|---|
1. | cygnet-bulwark | 0 | 14,945 | 0.00% |
2. | cygnet-knox | 0 | 9,284 | 0.00% |
3. | cygnet-citadel | 2 | 13,211 | 0.02% |
4. | o1-preview | 7 | 713 | 0.98% |
5. | o1-mini | 14 | 1,119 | 1.25% |
6. | claude-3-5-sonnet-20240620 | 45 | 2,907 | 1.55% |
7. | google/gemini-pro-1.5 | 41 | 2,539 | 1.61% |
8. | claude-3-sonnet-20240229 | 40 | 2,112 | 1.89% |
9. | claude-3-opus-20240229 | 48 | 2,097 | 2.29% |
10. | meta-llama/llama-3.1-405b-instruct | 51 | 2,219 | 2.30% |
11. | claude-3-haiku-20240307 | 52 | 2,095 | 2.48% |
12. | google/gemini-flash-1.5 | 60 | 2,384 | 2.52% |
13. | meta-llama/llama-3.1-8b-instruct | 57 | 2,103 | 2.71% |
14. | gpt-4-0125-preview | 55 | 1,674 | 3.29% |
15. | gpt-4o-mini-2024-07-18 | 64 | 1,569 | 4.08% |
16. | gpt-4o-2024-08-06 | 63 | 1,542 | 4.09% |
17. | meta-llama/llama-3-70b-instruct | 69 | 1,593 | 4.33% |
18. | microsoft/phi-3.5-mini-128k-instruct | 64 | 1,470 | 4.35% |
19. | meta-llama/llama-3.1-70b-instruct | 76 | 1,656 | 4.59% |
20. | gpt-4-turbo-2024-04-09 | 71 | 1,455 | 4.88% |
21. | qwen/qwen-2-72b-instruct | 67 | 1,368 | 4.90% |
22. | google/gemma-2-27b-it | 72 | 1,462 | 4.92% |
23. | google/gemma-2-9b-it | 73 | 1,479 | 4.94% |
24. | qwen/qwen-2-7b-instruct | 61 | 1,107 | 5.51% |
25. | cohere/command-r-plus-08-2024 | 109 | 1,535 | 7.10% |
26. | microsoft/wizardlm-2-8x22b | 76 | 988 | 7.69% |
27. | mistralai/mistral-large | 98 | 1,117 | 8.77% |