💣 Single Turn Harmful Outputs Leaderboard

🏁 Started a year ago $38,000 of $42,000 awarded

Attempt to break various large language models (LLMs) using a singular chat message.

Last updated 7 minutes ago

Models ranked by User Break Rate

Ranking
Model
Safety Violation Count
Total Requests
User Break Rate
1. cygnet-bulwark 022,9270.00%
2. cygnet-knox 013,9300.00%
3. cygnet-citadel 514,9360.03%
4. o1-preview 161,5801.01%
5. claude-3-5-sonnet-20240620 636,0891.03%
6. o1-mini 221,8881.17%
7. claude-3-sonnet-20240229 492,7691.77%
8. google/gemini-pro-1.5 613,1801.92%
9. claude-3-opus-20240229 582,8702.02%
10. meta-llama/llama-3.1-405b-instruct 572,7262.09%
11. claude-3-haiku-20240307 673,1822.11%
12. meta-llama/llama-3.1-8b-instruct 642,7692.31%
13. google/gemini-flash-1.5 823,3812.43%
14. gpt-4-0125-preview 662,3872.76%
15. microsoft/phi-3.5-mini-128k-instruct 772,1333.61%
16. gpt-4o-2024-08-06 832,2713.65%
17. qwen/qwen-2-72b-instruct 862,2403.84%
18. gpt-4o-mini-2024-07-18 812,1063.85%
19. meta-llama/llama-3-70b-instruct 822,0623.98%
20. meta-llama/llama-3.1-70b-instruct 882,0414.31%
21. gpt-4-turbo-2024-04-09 861,9834.34%
22. google/gemma-2-9b-it 841,9164.38%
23. google/gemma-2-27b-it 891,9464.57%
24. qwen/qwen-2-7b-instruct 701,3705.11%
25. cohere/command-r-plus-08-2024 1372,4225.66%
26. microsoft/wizardlm-2-8x22b 981,4896.58%
27. mistralai/mistral-large-2407 1341,8357.30%