Description

Jailbreak LLMs while you attend DEF CON!

Defcon and Gray Swan AI Jailbreaking arena

Introduction

Gray Swan is descending on DEF CON with a bite‑sized arena that packs a punch.

Over five chaotic days you’ll be given a single, anonymized AI model and a handful of challenges that progressively increase in difficulty. Your job is to jailbreak, misdirect and otherwise outsmart the AI. In true DEF CON style we’re keeping the playing field level: the challenge URL will not be public. Instead, we’ll hand out stickers at the conference with an unlisted link. Speed demons, relax… this competition rewards creativity and persistence rather than the person who happened to scan a sticker first.

Participation is completely free. Gray Swan has already awarded over US $300,000 in cash prizes across our arenas, and this micro‑challenge is another chance to test your skills without spending a dime.

The more interesting your attack, the better. We’re handing out four cash prizes of US $250 each (total $1,000) for submissions that impress our panel of mischief‑loving judges. To be eligible you must register an account, follow the engagement rules below, and submit your breaks through the Gray Swan platform. Winners will be announced on the leaderboard, on our Discord server, and by email once the arena closes. Once you read through the objective below click the chat tab and jump in!

Objective

Push an anonymized AI model beyond its intended guard‑rails. Your goal is to coax, mislead or trick the model into violating its instructions in creative ways, without providing the harmful information yourself. There are four challenge tiers of increasing difficulty; details are revealed only when you encounter them. You may submit unlimited attempts. Refrain from spinning up bots… this is about human skill, not automation. When you get a winning output, submit your interaction for scoring via the submit cta. Our autograder provides immediate feedback; questionable calls can be appealed to human judges.

Challenge Tiers

There are four tiers of increasing difficulty. You’ll discover each challenge as you explore the AI’s guard‑rails. All tiers become available simultaneously at the start of the arena.

Interaction Guidelines

  • Unlimited Turns. You can chat with the model as much as needed to achieve a valid break.

  • No Identity Probing. Do not ask the AI which company, lab or model family it belongs to. Such questions add no value and can get you disqualified.

  • Manual Only. Scripts, bots or automated tooling are prohibited. DEF CON is about human cunning, not F5 keys.

  • Submission Protocol. Once you elicit a successful break, submit your conversation through the challenge interface. Submissions are scored automatically; you will see an immediate status.

  • Appeals. If you believe a break was incorrectly rejected, you may appeal. Our human judges will review contested submissions after the competition closes.

Rules of Engagement

  1. Single account per human. One registration per participant. Multiple sign‑ups or sock‑puppets will lead to disqualification.

  2. Keep it secret, keep it safe. Do not share the unlisted arena URL or your successful jailbreaks until at least 30 days after the arena ends.

  3. Model must misbehave. The majority of harmful or erroneous content must be generated by the model itself, not pre‑supplied by your prompt. Parodies or satirical spoofs don’t count.

  4. Responsible red‑teaming. While you’re encouraged to stretch the model, do not attempt to actually commit illegal acts or disrupt external systems. This arena is for research and fun; all harmful suggestions should remain hypothetical.

Judging & Appeals

Our autograder evaluates each submission against the specific criteria for the targeted behavior. You’ll receive immediate feedback (accepted or rejected). For borderline cases, or if you feel the grader was too harsh, you may submit an appeal. Appeals are reviewed by Gray Swan staff and invited security experts after the event ends. The judges’ decision on appeals is final.

Prizing

We have four cash prizes, each worth US $250, to recognize outstanding feats during the arena. Because speed exploits aren’t fair when access is controlled by stickers, we’re focusing on quality and creativity. Prizes may be awarded to the same participant more than once if they truly blow our minds.

Proposed categories (subject to revision based on submissions):

  • Most Elegant Exploit. Awarded for the break that demonstrates the cleanest, most technically clever method of defeating safeguards without unnecessary complexity.

  • Social Engineering Sorcerer. Awarded for a break that masterfully leverages persuasion, misdirection or real‑world context to get the AI to do something it shouldn’t.

  • Mad Scientist Award. Awarded for the wildest or most creative abuse of the AI's tools or instructions… the kind of jailbreak that makes our judges say “I never would have thought of that.”

  • Minimalist Master. Awarded to the participant who elicits a valid break using the fewest total tokens (user plus AI) in a single successful submission. Ties will be resolved by the earlier submission time or, if necessary, at the judges’ discretion.

These categories are intentionally broad. Staff reserve the right to adjust or consolidate awards to reflect the nature of the submissions we receive. Tie‑breaks will be resolved at the judges’ discretion. All prizes are denominated in USD and will be paid via Stripe. If Stripe is not available in your country please respond to the winners email and our team will help sort it out.

To receive any prize, your total winnings must exceed US $100 (per our general payout policy). If you earn less than this threshold, your earnings roll over to future arenas.

Winner Announcement & Communication

Winners will be listed on the public leaderboard once the final results are processed. We will also announce winners on the Gray Swan Discord server and send direct emails to the addresses associated with winning accounts. Please ensure your account details are accurate including country and that you’ve opted in to email notifications.

Final Notes

By participating in this DEF CON micro‑arena you agree to abide by these rules and Gray Swan’s Terms of Service. We reserve the right to modify rules or prizes before the arena opens and to interpret them in the spirit of a fair competition.

If you enjoy this mini‑challenge, check out our past competitions… many of them are still open and continue to award prizes. We also recommend the Gray Swan Proving Ground, a suite of hands‑on exercises designed to help you build red‑team skills at your own pace.

Get your badge, scan your sticker, and let the hacking begin. May the most devious prompt wrangler win!