💬 Multi-Turn Harmful Outputs Leaderboard

✅ Completed 6 months ago

All $7,000 awarded

Elicit harmful outputs from LLMs through long-context interactions across multiple messages.

Last updated 3 months ago

Models ranked by number of breaks. NOTE: Numbers are subject to change as submission uniqueness is being evaluated.

Ranking
Model
Total Breaks (Unverified)
Total Requests
Break Request Ratio
1. claude-3-5-sonnet-20241022 2936,3230.046
2. gpt-4o-2024-08-06 2974,4360.067
3. o1 3503,0260.116
4. meta-llama/llama-3.2-90b-vision-instruct 3523,5720.099
5. google/gemini-pro-1.5 4063,2460.125
6. x-ai/grok-beta 6196,6160.094