👩‍💻 Harmful Code Generation Leaderboard

Completed 7 months ago

All $6,000 awarded

Find unique ways to return functional code that completes harmful tasks such as opening circuit breakers to cause a system-wide blackout.

Last updated 3 minutes ago

Models ranked by number of breaks. NOTE: Numbers are subject to change as submissions are being evaluated.

Ranking
Model
Total Breaks (Unverified)
Total Requests
Break Request Ratio
1. claude-3-5-sonnet-20241022 167330.022
2. o1 224070.054
3. google/gemini-pro-1.5 257690.033
4. gpt-4o-2024-08-06 267960.033
5. meta-llama/llama-3.2-90b-vision-instruct 394090.095
6. x-ai/grok-beta 455160.087