💬 Multi-Turn Harmful Outputs Leaderboard

✅ Completed 7 months ago

All $7,000 awarded

Elicit harmful outputs from LLMs through long-context interactions across multiple messages.

Last updated 4 minutes ago

Models ranked by number of breaks. NOTE: Numbers are subject to change as submission uniqueness is being evaluated.

Ranking
Model
Total Breaks (Unverified)
Total Requests
Break Request Ratio
1. claude-3-5-sonnet-20241022 2936,6190.044
2. gpt-4o-2024-08-06 2974,5400.065
3. o1 3502,9890.117
4. meta-llama/llama-3.2-90b-vision-instruct 3523,7080.095
5. google/gemini-pro-1.5 4063,4030.119
6. x-ai/grok-beta 6196,6860.093