Description
Work through coding agents to hack into computer networks in this AI-and-cybersecurity red-teaming crossover.
Machine-in-the-Middle Challenge
Competition Dates: November 1 (Warm-up) - December 14, 2025
Prize Pool: $100,000 cash & prizes from Hack The Box
Co-Sponsors: Hack The Box, alongside our general Arena sponsors: UK AISI, OpenAI, Anthropic, and Amazon

The Challenge
Machine-in-the-Middle is Gray Swan's first cybersecurity arena where traditional offensive security skills matter as much as AI capabilities. Over six weeks, you'll test the limits of Software Engineering (SWE) Agents - advanced AI systems that can plan, code, and execute complex operations across technical environments.
You'll exploit real vulnerabilities across:
Web applications with multi-layered security controls
Open-source software with realistic CVE scenarios
Multi-host networks requiring lateral movement and privilege escalation
Depending on the scenario, you'll either work solo with your hacking toolkit or guide an AI agent to break into target systems. We're measuring capability uplift: how AI changes offensive security operations in practice.
Hack The Box brings their CTF and training expertise into the Arena experience for this challenge. The Arena is supported, in general, by our Frontier Lab sponsors: UK AISI, OpenAI, Anthropic, and Amazon.
Timeline & Format
Wave 1 (Warm-up): November 1, 1:00 PM EDT
Short, focused challenges to test systems and get orientedWave 2 (First Prized Wave): November 8, 1:00 PM EST
Official competition beginsWaves 3-6: Weekly drops every Saturday at 1:00 PM EST
Challenge End: December 14, 11:59 PM EST
Each wave introduces new scenarios and difficulty levels. Late arrivals can still compete - past waves remain accessible throughout the competition.
What You'll Face
Web Vulnerabilities
Modern web applications with realistic business logic flaws, authentication bypasses, and injection vectors. No synthetic CTF flags - you'll need to demonstrate actual compromise.
Open-Source CVEs
Real-world vulnerability scenarios requiring code analysis, exploit development, and proof-of-concept demonstrations. Some will have public CVE identifiers; others will test your zero-day instincts.
Multi-Host Environments
Network scenarios requiring reconnaissance, lateral movement, and persistence. Docker-based infrastructure that mirrors production complexity.
SWE Agent Integration
At any point, you can deploy AI agents with coding, analysis, and automation capabilities. Where do they excel versus where human intuition dominates?
Prize Structure
$100,000 cash & great prizes from HTB.
Detailed prize mechanics and category breakdowns will be announced at the start of the competition.
Red Team Contracts:
Top performers will be considered for private red teaming engagements within Gray Swan's professional network - confidential projects with lab and enterprise partners under NDA.
Research Contribution
Machine-in-the-Middle serves a dual purpose: competition and research.
Your participation helps answer critical questions about AI in offensive security. Where do AI agents provide genuine capability uplift? What tasks still require human expertise? How do human+AI workflows compare to solo approaches in speed, success rates, and novel techniques?
Your methodology and performance data contribute to Gray Swan's research on AI capabilities in cybersecurity. This research informs AI lab safety teams, enterprise security teams evaluating AI tools, and policy discussions on AI in security contexts. We'll publish aggregated findings after the competition.
Preparation
Verify your Arena account and update contact information
Complete the pre-event survey (required for prize eligibility - helps us measure skill uplift)
Join the Discord for technical support, wave briefings, and community discussion
Add competition dates to your calendar
Rules of Engagement
One account per person - Multiple registrations result in disqualification
No solution sharing - Keep exploits private until 30 days post-competition
Scope boundaries - Only attack designated challenge infrastructure, not Gray Swan systems
No deanonymization - Do not attempt to identify which models are being tested
Fair play - No DoS attacks, no interference with other participants
Legal tools only - This is offensive security research, not actual crime
Responsible reporting - If you find critical vulnerabilities in Gray Swan infrastructure, report responsibly
Violations may result in disqualifications and bans from future Gray Swan events.
Who Should Compete
This challenge is designed for:
Experienced AI red teamers and AI security researchers who want to push the edge of applied adversarial tactics and evaluate how AI systems perform in realistic cybersecurity contexts
Experienced traditional red teamers and security researchers focused on testing their skills across web, network, and open-source software vulnerabilities while exploring how AI tools can augment their workflow
CTF competitors seeking crossover challenges that blend offensive security with AI interaction
AI enthusiasts curious to see how far agent-assisted hacking can go in a controlled, research-safe environment
Practitioners and researchers interested in understanding how human+AI collaboration performs across realistic cybersecurity tasks
Machine-in-the-Middle welcomes participants across the full skill spectrum, from first-time CTF players experimenting with AI agents to seasoned professionals testing model-assisted workflows.
Get Started
Competition begins November 1. The warm-up wave helps you test the environment and get familiar with submission formats before prizes start November 8.
Questions? Join the Discord or email support.
This is where traditional hacking meets AI capability. See you in the Arena.
Sponsors & Partners
Machine-in-the-Middle is co-sponsored by Hack The Box and supported by industry partners who fund prizes and platform development.
Gray Swan's broader Arena research is backed by Frontier Lab Sponsors: UK AISI, OpenAI, Anthropic, and Amazon - supporting AI safety and red teaming research across multiple domains.