👩💻 Harmful Code Generation Leaderboard
✅ Completed 7 months ago
All $6,000 awarded
Find unique ways to return functional code that completes harmful tasks such as opening circuit breakers to cause a system-wide blackout.
Last updated 3 minutes ago
Models ranked by number of breaks. NOTE: Numbers are subject to change as submissions are being evaluated.
Ranking | Model | Total Breaks (Unverified) | Total Requests | Break Request Ratio |
---|---|---|---|---|
1. | claude-3-5-sonnet-20241022 | 16 | 733 | 0.022 |
2. | o1 | 22 | 407 | 0.054 |
3. | google/gemini-pro-1.5 | 25 | 769 | 0.033 |
4. | gpt-4o-2024-08-06 | 26 | 796 | 0.033 |
5. | meta-llama/llama-3.2-90b-vision-instruct | 39 | 409 | 0.095 |
6. | x-ai/grok-beta | 45 | 516 | 0.087 |
👩💻 Harmful Code Generation Leaderboard
✅ Completed 7 months ago
All $6,000 awarded
Find unique ways to return functional code that completes harmful tasks such as opening circuit breakers to cause a system-wide blackout.
Last updated 3 minutes ago
Models ranked by number of breaks. NOTE: Numbers are subject to change as submissions are being evaluated.
Ranking | Model | Total Breaks (Unverified) | Total Requests | Break Request Ratio |
---|---|---|---|---|
1. | claude-3-5-sonnet-20241022 | 16 | 733 | 0.022 |
2. | o1 | 22 | 407 | 0.054 |
3. | google/gemini-pro-1.5 | 25 | 769 | 0.033 |
4. | gpt-4o-2024-08-06 | 26 | 796 | 0.033 |
5. | meta-llama/llama-3.2-90b-vision-instruct | 39 | 409 | 0.095 |
6. | x-ai/grok-beta | 45 | 516 | 0.087 |