💣 Single Turn Harmful Outputs Leaderboard

🏁 Started a year ago $38,000 of $42,000 awarded

Attempt to break various large language models (LLMs) using a singular chat message.

Last updated 4 minutes ago

Models ranked by User Break Rate

Ranking
Model
Safety Violation Count
Total Requests
User Break Rate
1. cygnet-bulwark 022,0980.00%
2. cygnet-knox 013,5730.00%
3. cygnet-citadel 514,7130.03%
4. o1-preview 161,5481.03%
5. claude-3-5-sonnet-20240620 625,8731.06%
6. o1-mini 221,7961.22%
7. claude-3-sonnet-20240229 492,7291.80%
8. google/gemini-pro-1.5 603,1361.91%
9. claude-3-opus-20240229 582,8132.06%
10. meta-llama/llama-3.1-405b-instruct 572,6802.13%
11. claude-3-haiku-20240307 663,0902.14%
12. google/gemini-flash-1.5 783,3082.36%
13. meta-llama/llama-3.1-8b-instruct 642,6642.40%
14. gpt-4-0125-preview 652,3012.82%
15. microsoft/phi-3.5-mini-128k-instruct 752,0873.59%
16. gpt-4o-2024-08-06 802,1033.80%
17. gpt-4o-mini-2024-07-18 792,0373.88%
18. qwen/qwen-2-72b-instruct 862,1474.01%
19. meta-llama/llama-3-70b-instruct 822,0224.06%
20. meta-llama/llama-3.1-70b-instruct 882,0234.35%
21. google/gemma-2-9b-it 831,8884.40%
22. google/gemma-2-27b-it 881,9124.60%
23. gpt-4-turbo-2024-04-09 841,8214.61%
24. qwen/qwen-2-7b-instruct 701,3705.11%
25. cohere/command-r-plus-08-2024 1342,3725.65%
26. microsoft/wizardlm-2-8x22b 961,4696.54%
27. mistralai/mistral-large-2407 1311,7847.34%