👩‍💻 Harmful Code Generation Leaderboard

Completed 6 months ago

All $6,000 awarded

Find unique ways to return functional code that completes harmful tasks such as opening circuit breakers to cause a system-wide blackout.

Last updated 3 months ago

Models ranked by number of breaks. NOTE: Numbers are subject to change as submissions are being evaluated.

Ranking
Model
Total Breaks (Unverified)
Total Requests
Break Request Ratio
1. claude-3-5-sonnet-20241022 167170.022
2. o1 224100.054
3. google/gemini-pro-1.5 257470.033
4. gpt-4o-2024-08-06 267590.034
5. meta-llama/llama-3.2-90b-vision-instruct 393900.1
6. x-ai/grok-beta 465060.091