👩💻 Harmful Code Generation Leaderboard
✅ Completed 6 months ago
All $6,000 awarded
Find unique ways to return functional code that completes harmful tasks such as opening circuit breakers to cause a system-wide blackout.
Last updated 3 months ago
Models ranked by number of breaks. NOTE: Numbers are subject to change as submissions are being evaluated.
Ranking | Model | Total Breaks (Unverified) | Total Requests | Break Request Ratio |
---|---|---|---|---|
1. | claude-3-5-sonnet-20241022 | 16 | 717 | 0.022 |
2. | o1 | 22 | 410 | 0.054 |
3. | google/gemini-pro-1.5 | 25 | 747 | 0.033 |
4. | gpt-4o-2024-08-06 | 26 | 759 | 0.034 |
5. | meta-llama/llama-3.2-90b-vision-instruct | 39 | 390 | 0.1 |
6. | x-ai/grok-beta | 46 | 506 | 0.091 |