AI Red‑Team & Prompt Engineering Resource Guide
Sharpen your AI security game with Gray Swan AI’s ultimate resource hub. This curated guide links 150+ prompt-engineering repositories, red-team tools, jailbreak datasets, cutting-edge research papers, hands-on HTB courses, bug-bounty programs, and expert videos; all in one place. Whether you’re a seasoned cybersecurity professional, an AI researcher testing LLM guardrails, or a newcomer eager to master prompt injection, you’ll find everything required to probe, protect, and perfect large language models. Bookmark this page to stay ahead of the latest threats, techniques, and best practices in AI red teaming.
About This Guide
Welcome to the Gray Swan AI Resource Vault. We combed through every post in our Discord’s #resources channel, pulled out all unique links, and grouped them into clear themes; prompt engineering, red-team tools, datasets, research papers, and more. For each item you’ll find:
Name – a concise title that tells you what the link is.
Link – click straight through, no hunting required.
Why it matters – a one-sentence snapshot of how the resource can level up your red-team chops or research workflow.
Use this guide as your launchpad for deeper dives, quick reference during Arena events, or inspiration when you need a fresh exploit idea. If it was shared in our community, it’s here… and if you discover something we missed, let us know in our resources channel on Discord so we can keep the vault complete.

Prompt-Engineering & Jailbreak Repositories
Prompt-Engineering-Holy-Grail — A curated GitHub hub of jailbreak patterns, role-play tricks, and prompt taxonomies for crafting advanced prompts.
https://github.com/zacfrulloni/Prompt-Engineering-Holy-Grail
MisguidedAttention — A prompt set that deliberately injects misleading context to stress-test LLM reasoning and alignment.
https://github.com/cpldcpu/MisguidedAttention
Deck-of-Many-Prompts — A “card deck” of manual prompt-injection payloads for rapid red-team experimentation.
https://github.com/peluche/deck-of-many-prompts
L1B3RT4S — A popular jailbreak prompt collection (“liberation prompts”) with 10k+ stars for bypassing default guardrails.
https://github.com/elder-plinius/L1B3RT4S
Awesome-ChatGPT-Prompts — Community-maintained list of creative ChatGPT prompts for productivity, coding, and exploits.
https://github.com/f/awesome-chatgpt-prompts
ZetaLib — Repository of historical and modern jailbreaks plus old “Born Survivalist” prompt examples.
https://github.com/Exocija/ZetaLib/tree/main
https://github.com/Exocija/ZetaLib/blob/main/Prompts/Old Jailbreaks/Born Survivalist.txt
Leaked-System-Prompts — Real-world system prompts collected from public leaks to study how developers scaffold LLM behavior.
https://github.com/jujumilk3/leaked-system-prompts
LLM-Attacks (repo) — Codebase to reproduce universal, transferable jailbreak attacks and evaluate model robustness.
https://github.com/llm-attacks/llm-attacks
LLM-Attacks (site) — Companion website explaining the attack taxonomy with step-by-step demos.
PushTheModel Jailbreak Gists — Two gist files of clever multi-turn jailbreak payloads used in prior red-team events.
https://gist.github.com/PushTheModel/16da91bb557465867176b56f96dfe3ca
https://gist.github.com/PushTheModel/e7230e670c19609a936d248cb40482d4
Red-Team Tools & Libraries
FuzzyAI — CyberArk’s automated LLM fuzzing framework that discovers jailbreaks with genetic search.
https://github.com/cyberark/FuzzyAI
TaskTracker — Microsoft research code for logging and reproducing multi-step agent tasks during security testing.
https://github.com/microsoft/TaskTracker
Speedbunny Red-Team Suite — Claude-3.7 Sonnet jailbreak examples and utility scripts for bulk attack generation.
https://github.com/speedbunny/red-team/tree/main/claude-3.7-sonnet
https://github.com/speedbunny/red-team/tree/main/utilities
AnyChat (HF Space) — Web interface that lets you chat with any open-source model and test prompt-injection quickly.
https://huggingface.co/spaces/akhaliq/anychat
Genesis World Simulator — Embodied-AI generative world used to test agent safety in robotics and LLM planning.
https://github.com/Genesis-Embodied-AI/Genesis
ZetaLib Prompts Loader — Utility scripts for injecting ZetaLib jailbreaks into interactive chat sessions.
https://github.com/Exocija/ZetaLib/tree/main
Joey-Melo Payloads — OWASP AITG-APP payload directory with ready-made strings for indirect prompt-injection labs.
https://github.com/joey-melo/payloads/tree/main/OWASP AITG-APP
Gray Swan Arena Clean UI (Greasyfork) — User script that declutters the Arena interface for faster red-team workflow.
https://greasyfork.org/en/scripts/544207-gray-swan-arena-clean-ui
Courses, Labs & Competitions
Gray Swan Proving Ground & Arena — Free, always-on platform where red teamers of any level can sharpen skills with weekly proving-ground drops, then enter cash-prize Arena competitions sponsored by leading AI labs.
HTB “Applications of AI in InfoSec” — Hands-on Hack-The-Box module on AI-powered malware and defensive countermeasures.
https://academy.hackthebox.com/course/preview/applications-of-ai-in-infosec
HTB “Introduction to Red Teaming AI” — Foundations course covering prompt injection, jailbreak chains, and attack logging.
https://academy.hackthebox.com/course/preview/introduction-to-red-teaming-ai
HTB “Prompt Injection Attacks” — Focused lab that walks through direct and indirect injection exploits, with guided solutions.
https://academy.hackthebox.com/course/preview/prompt-injection-attacks
HTB “AI Red Teamer” Path — Four-course learning path culminating in a practical LLM-exploit capstone project.
SATML Competitions — Ongoing adversarial-ML contests hosted by the Security & Trust in ML community.
Datasets & Benchmarks
HackAPrompt Dataset (HF) — 600k+ real jailbreak submissions from a global prompt-hacking competition.
https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset
Pliny HackAPrompt Subset — Filtered subset focusing on high-impact attacks and evaluation labels.
https://huggingface.co/datasets/hackaprompt/Pliny_HackAPrompt_Dataset
Minos-v1 — NousResearch’s evaluation set for measuring LLM compliance versus refusal.
https://huggingface.co/NousResearch/Minos-v1
HarmBench Explorer — Interactive site summarizing 510 harmful behavior categories and model ASR rates.
https://www.harmbench.org/explore
Research Papers & Standards
OpenAI Early-Access Safety Testing — Policy note inviting vetted red teams to preview unreleased models.
https://openai.com/index/early-access-for-safety-testing/
OpenAI Deliberative Alignment — White-paper proposing chain-of-thought self-critique to improve model safety.
https://openai.com/index/deliberative-alignment/
Diverse & Effective Red Teaming (PDF) — OpenAI research on auto-reward RL to breed stronger jailbreaks.
https://cdn.openai.com/papers/diverse-and-effective-red-teaming.pdf
External Red Teaming Approach (PDF) — How OpenAI coordinates outside researchers for systemic LLM tests.
https://cdn.openai.com/papers/openais-approach-to-external-red-teaming.pdf
NIST AI 600-1 — U.S. NIST workbook on secure LLM deployment across the AI lifecycle.
https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
OWASP Top-10 for LLMs 2025 — Community list of the ten most critical security risks in LLM applications.
https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/
Alignment Faking — Anthropic paper detailing how models can simulate obedience while secretly misbehaving.
https://www.anthropic.com/news/alignment-faking
Constitutional Classifiers — Anthropic study on policy-selectable refusal models via small add-on classifiers.
https://www.anthropic.com/research/constitutional-classifiers
Agentic Misalignment — Anthropic research exploring failure modes in autonomous agent chains.
https://www.anthropic.com/research/agentic-misalignment
Subliminal Learning — Alignment Center note on how models pick up hidden cues without explicit tokens.
https://alignment.anthropic.com/2025/subliminal-learning/
“A Strong Reject for Empty Jailbreaks” (arXiv 2402.10260) — Critique on over-hyped jailbreak claims lacking benchmarks.
https://arxiv.org/pdf/2402.10260
ArXiv 2507.20526 — Agent Red Teaming Competition Paper — Peer-reviewed write-up of the Gray Swan multistage contest.
https://arxiv.org/abs/2507.20526
(Scores of additional arXiv links are in the “Extra Papers” appendix at the end of this document to keep the main list readable.)
Blog Posts & Analysis
Phishing for Gemini — Odin AI blog post dissecting prompt-injection vectors in Google’s Gemini UI.
https://0din.ai/blog/phishing-for-gemini
Reasoning LLM Jailbreak (Adversa AI) — Detailed attack run-through across DeepSeek, Qwen, and Kimi models.
https://adversa.ai/blog/ai-red-teaming-reasoning-llm-jailbreak-china-deepseek-qwen-kimi/
False Memories Exploit (Ars Technica) — News story on how hidden prompts let attackers exfiltrate private chat data.
Breaking LLM Systems: Holiday Guide — LinkedIn article by Ben K-Yorke with step-by-step jailbreak tactics.
Echo Prism — Essay on personalized AI “prisms” and associated security blind spots.
https://www.linkedin.com/pulse/echo-prism-robert-seger-kcejf
BlackMamba Malware — HYAS write-up on polymorphic malware created by GPT-4 each time it runs.
https://www.hyas.com/blog/blackmamba-using-ai-to-generate-polymorphic-malware
Gray Swan Blog: Silent Characters Stealth Attack — Recap of a Unicode invisibles challenge in the Arena.
https://app.grayswan.ai/arena/blog/silent-characters-stealth-attack-week
(More blog links are included in the appendix to avoid crowding this main list.)
Videos & Podcasts
Computerphile — “Prompt Injection” — 12-minute explainer on indirect injection via user-generated content.
https://www.youtube.com/watch?v=Xx4Tpsk_fnM&ab_channel=Computerphile
DY9KHPckI4k — MITRE ATLAS Overview — Conference talk on mapping ATT&CK-style tactics to AI systems.
https://www.youtube.com/watch?v=DY9KHPckI4k
Critical Thinking Podcast (YouTube channel) — Ongoing interviews with AI security researchers.
https://www.youtube.com/@criticalthinkingpodcast
BoXl0CqRIWQ — “Invisible Unicode” Demo — Live proof-of-concept showing invisible control characters.
https://youtu.be/boXl0CqRIWQ?si=9CBc0k-X4bu-C6PB
Bug-Bounty & Disclosure Programs
OpenAI Bug Bounty — Official bounty rules and payout tiers for model jailbreaks and code execution.
https://openai.com/bio-bug-bounty/
Anthropic Bug Bounty — New program rewarding harmful output bypasses on Claude models.
https://www.anthropic.com/news/testing-our-safety-defenses-with-a-new-bug-bounty-program
Stripe LLM Bounty — HackerOne campaign targeting AI misuse in Stripe docs assistant.
Inflection-Pi Bounty — HackerOne program for privacy leaks and compliance failures in the Pi LLM.
https://hackerone.com/inflection
Indeed AI Disclosure — Bugcrowd scope for Indeed’s internal recruiting chatbot.
https://bugcrowd.com/engagements/indeed/announcements
Miscellaneous & Utilities
Playground InjectPrompt — Web sandbox to paste URLs and generate indirect prompt-injection payloads.
https://playground.injectprompt.com/
MITRE ATLAS Matrix — Attack-surface matrix mapping adversary TTPs to ML workflows.
https://atlas.mitre.org/matrices/ATLAS
Gray Swan Arena Chat “Agent Red Teaming” — Public room for sharing live exploits during events.
https://app.grayswan.ai/arena/chat/agent-red-teaming
OnionGPT (Tor) — Dark-web GPT instance with no safety filters for extreme testing.
http://oniongpt6lntsoztgylhju7nmqedlq6fjexe55z327lmxyae3nutlyad.onion/
AI Keys Leak Index — Searchable dump of accidentally exposed OpenAI keys on public GitHub repos.
https://ai-keys-leaks.begimher.com/
PortSwigger LLM Labs — Interactive labs teaching prompt-injection and jailbreaking through OWASP-style challenges.
https://portswigger.net/web-security/llm-attacks
Appendix — Extra Papers, Blogs, and Links (Full Exhaustive List)
https://arxiv.org/abs/2403.14720
https://arxiv.org/abs/2406.00799
https://arxiv.org/abs/2501.07238
https://arxiv.org/abs/2502.17424
https://arxiv.org/abs/2505.16957
https://arxiv.org/abs/2505.20162
https://arxiv.org/abs/2506.03350
https://arxiv.org/abs/2506.08872v1
https://arxiv.org/abs/2506.14682
https://arxiv.org/abs/2506.14866
https://arxiv.org/abs/2507.02737
https://arxiv.org/abs/2507.14805
https://arxiv.org/pdf/2307.15043
https://arxiv.org/pdf/2308.01990
https://arxiv.org/pdf/2410.13691
https://arxiv.org/pdf/2501.04931
https://arxiv.org/pdf/2501.18837
https://arxiv.org/pdf/2503.08195
https://arxiv.org/pdf/2506.14866
https://openreview.net/forum?id=6Mxhg9PtDE
https://openreview.net/forum?id=VSSQud4diJ
How to use this guide
This catalogue is intended to serve as a starting point for researchers, engineers and students entering the field of AI red teaming. The prompt engineering resources provide recipes and libraries for crafting prompts. The tools and datasets section contains frameworks for fuzzing, benchmarking and logging attacks. The papers and standards section summarizes seminal research and authoritative guidance such as NIST’s adversarial‑machine‑learning taxonomy and OWASP’s Top 10 list Courses and competitions offer hands‑on experience, while blog posts and videos keep practitioners up‑to‑date on current attacks, defenses and debates. Finally, miscellaneous tools include helpful scripts and dashboards for day‑to‑day experimentation.