Tutorial

AI Red‑Team & Prompt Engineering Resource Guide

By Ayla Croft

Sharpen your AI security game with Gray Swan AI’s ultimate resource hub. This curated guide links 150+ prompt-engineering repositories, red-team tools, jailbreak datasets, cutting-edge research papers, hands-on HTB courses, bug-bounty programs, and expert videos; all in one place. Whether you’re a seasoned cybersecurity professional, an AI researcher testing LLM guardrails, or a newcomer eager to master prompt injection, you’ll find everything required to probe, protect, and perfect large language models. Bookmark this page to stay ahead of the latest threats, techniques, and best practices in AI red teaming.

About This Guide

Welcome to the Gray Swan AI Resource Vault. We combed through every post in our Discord’s #resources channel, pulled out all unique links, and grouped them into clear themes; prompt engineering, red-team tools, datasets, research papers, and more. For each item you’ll find:

  • Name – a concise title that tells you what the link is.

  • Link – click straight through, no hunting required.

  • Why it matters – a one-sentence snapshot of how the resource can level up your red-team chops or research workflow.

Use this guide as your launchpad for deeper dives, quick reference during Arena events, or inspiration when you need a fresh exploit idea. If it was shared in our community, it’s here… and if you discover something we missed, let us know in our resources channel on Discord so we can keep the vault complete.

Gray Swan's AI Red‑Team & Prompt Engineering Resource Guide

Prompt-Engineering & Jailbreak Repositories

  • Prompt-Engineering-Holy-Grail — A curated GitHub hub of jailbreak patterns, role-play tricks, and prompt taxonomies for crafting advanced prompts.

https://github.com/zacfrulloni/Prompt-Engineering-Holy-Grail

  • MisguidedAttention — A prompt set that deliberately injects misleading context to stress-test LLM reasoning and alignment.

https://github.com/cpldcpu/MisguidedAttention

  • Deck-of-Many-Prompts — A “card deck” of manual prompt-injection payloads for rapid red-team experimentation.

https://github.com/peluche/deck-of-many-prompts

  • L1B3RT4S — A popular jailbreak prompt collection (“liberation prompts”) with 10k+ stars for bypassing default guardrails.

https://github.com/elder-plinius/L1B3RT4S

  • Awesome-ChatGPT-Prompts — Community-maintained list of creative ChatGPT prompts for productivity, coding, and exploits.

https://github.com/f/awesome-chatgpt-prompts

  • ZetaLib — Repository of historical and modern jailbreaks plus old “Born Survivalist” prompt examples.

https://github.com/Exocija/ZetaLib/tree/main

https://github.com/Exocija/ZetaLib/blob/main/Prompts/Old Jailbreaks/Born Survivalist.txt

  • Leaked-System-Prompts — Real-world system prompts collected from public leaks to study how developers scaffold LLM behavior.

https://github.com/jujumilk3/leaked-system-prompts

  • LLM-Attacks (repo) — Codebase to reproduce universal, transferable jailbreak attacks and evaluate model robustness.

https://github.com/llm-attacks/llm-attacks

  • LLM-Attacks (site) — Companion website explaining the attack taxonomy with step-by-step demos.

https://llm-attacks.org/

  • PushTheModel Jailbreak Gists — Two gist files of clever multi-turn jailbreak payloads used in prior red-team events.

https://gist.github.com/PushTheModel/16da91bb557465867176b56f96dfe3ca

https://gist.github.com/PushTheModel/e7230e670c19609a936d248cb40482d4


Red-Team Tools & Libraries

  • FuzzyAI — CyberArk’s automated LLM fuzzing framework that discovers jailbreaks with genetic search.

https://github.com/cyberark/FuzzyAI

  • TaskTracker — Microsoft research code for logging and reproducing multi-step agent tasks during security testing.

https://github.com/microsoft/TaskTracker

  • Speedbunny Red-Team Suite — Claude-3.7 Sonnet jailbreak examples and utility scripts for bulk attack generation.

https://github.com/speedbunny/red-team/tree/main/claude-3.7-sonnet

https://github.com/speedbunny/red-team/tree/main/utilities

  • AnyChat (HF Space) — Web interface that lets you chat with any open-source model and test prompt-injection quickly.

https://huggingface.co/spaces/akhaliq/anychat

  • Genesis World Simulator — Embodied-AI generative world used to test agent safety in robotics and LLM planning.

https://github.com/Genesis-Embodied-AI/Genesis

  • ZetaLib Prompts Loader — Utility scripts for injecting ZetaLib jailbreaks into interactive chat sessions.

https://github.com/Exocija/ZetaLib/tree/main

  • Joey-Melo Payloads — OWASP AITG-APP payload directory with ready-made strings for indirect prompt-injection labs.

https://github.com/joey-melo/payloads/tree/main/OWASP AITG-APP

  • Gray Swan Arena Clean UI (Greasyfork) — User script that declutters the Arena interface for faster red-team workflow.

https://greasyfork.org/en/scripts/544207-gray-swan-arena-clean-ui


Courses, Labs & Competitions


Datasets & Benchmarks

  • HackAPrompt Dataset (HF) — 600k+ real jailbreak submissions from a global prompt-hacking competition.

https://huggingface.co/datasets/hackaprompt/hackaprompt-dataset

  • Pliny HackAPrompt Subset — Filtered subset focusing on high-impact attacks and evaluation labels.

https://huggingface.co/datasets/hackaprompt/Pliny_HackAPrompt_Dataset

  • Minos-v1 — NousResearch’s evaluation set for measuring LLM compliance versus refusal.

https://huggingface.co/NousResearch/Minos-v1

  • HarmBench Explorer — Interactive site summarizing 510 harmful behavior categories and model ASR rates.

https://www.harmbench.org/explore


Research Papers & Standards

  • OpenAI Early-Access Safety Testing — Policy note inviting vetted red teams to preview unreleased models.

https://openai.com/index/early-access-for-safety-testing/

  • OpenAI Deliberative Alignment — White-paper proposing chain-of-thought self-critique to improve model safety.

https://openai.com/index/deliberative-alignment/

  • Diverse & Effective Red Teaming (PDF) — OpenAI research on auto-reward RL to breed stronger jailbreaks.

https://cdn.openai.com/papers/diverse-and-effective-red-teaming.pdf

  • External Red Teaming Approach (PDF) — How OpenAI coordinates outside researchers for systemic LLM tests.

https://cdn.openai.com/papers/openais-approach-to-external-red-teaming.pdf

  • NIST AI 600-1 — U.S. NIST workbook on secure LLM deployment across the AI lifecycle.

https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

  • OWASP Top-10 for LLMs 2025 — Community list of the ten most critical security risks in LLM applications.

https://genai.owasp.org/resource/owasp-top-10-for-llm-applications-2025/

  • Alignment Faking — Anthropic paper detailing how models can simulate obedience while secretly misbehaving.

https://www.anthropic.com/news/alignment-faking

  • Constitutional Classifiers — Anthropic study on policy-selectable refusal models via small add-on classifiers.

https://www.anthropic.com/research/constitutional-classifiers

  • Agentic Misalignment — Anthropic research exploring failure modes in autonomous agent chains.

https://www.anthropic.com/research/agentic-misalignment

  • Subliminal Learning — Alignment Center note on how models pick up hidden cues without explicit tokens.

https://alignment.anthropic.com/2025/subliminal-learning/

  • “A Strong Reject for Empty Jailbreaks” (arXiv 2402.10260) — Critique on over-hyped jailbreak claims lacking benchmarks.

https://arxiv.org/pdf/2402.10260

  • ArXiv 2507.20526 — Agent Red Teaming Competition Paper — Peer-reviewed write-up of the Gray Swan multistage contest.

https://arxiv.org/abs/2507.20526

(Scores of additional arXiv links are in the “Extra Papers” appendix at the end of this document to keep the main list readable.)


Blog Posts & Analysis

  • Phishing for Gemini — Odin AI blog post dissecting prompt-injection vectors in Google’s Gemini UI.

https://0din.ai/blog/phishing-for-gemini

  • Reasoning LLM Jailbreak (Adversa AI) — Detailed attack run-through across DeepSeek, Qwen, and Kimi models.

https://adversa.ai/blog/ai-red-teaming-reasoning-llm-jailbreak-china-deepseek-qwen-kimi/

  • False Memories Exploit (Ars Technica) — News story on how hidden prompts let attackers exfiltrate private chat data.

https://arstechnica.com/security/2024/09/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel/

  • Breaking LLM Systems: Holiday Guide — LinkedIn article by Ben K-Yorke with step-by-step jailbreak tactics.

https://www.linkedin.com/pulse/breaking-llm-systems-red-teamers-holiday-guide-ben-kereopa-yorke-qgjzc/?trackingId=vOZAPJ%2FRTt6swv3VVbURKw%3D%3D

  • Echo Prism — Essay on personalized AI “prisms” and associated security blind spots.

https://www.linkedin.com/pulse/echo-prism-robert-seger-kcejf

  • BlackMamba Malware — HYAS write-up on polymorphic malware created by GPT-4 each time it runs.

https://www.hyas.com/blog/blackmamba-using-ai-to-generate-polymorphic-malware

  • Gray Swan Blog: Silent Characters Stealth Attack — Recap of a Unicode invisibles challenge in the Arena.

https://app.grayswan.ai/arena/blog/silent-characters-stealth-attack-week

(More blog links are included in the appendix to avoid crowding this main list.)


Videos & Podcasts

  • Computerphile — “Prompt Injection” — 12-minute explainer on indirect injection via user-generated content.

https://www.youtube.com/watch?v=Xx4Tpsk_fnM&ab_channel=Computerphile

  • DY9KHPckI4k — MITRE ATLAS Overview — Conference talk on mapping ATT&CK-style tactics to AI systems.

https://www.youtube.com/watch?v=DY9KHPckI4k

  • Critical Thinking Podcast (YouTube channel) — Ongoing interviews with AI security researchers.

https://www.youtube.com/@criticalthinkingpodcast

  • BoXl0CqRIWQ — “Invisible Unicode” Demo — Live proof-of-concept showing invisible control characters.

https://youtu.be/boXl0CqRIWQ?si=9CBc0k-X4bu-C6PB


Bug-Bounty & Disclosure Programs

  • OpenAI Bug Bounty — Official bounty rules and payout tiers for model jailbreaks and code execution.

https://openai.com/bio-bug-bounty/

  • Anthropic Bug Bounty — New program rewarding harmful output bypasses on Claude models.

https://www.anthropic.com/news/testing-our-safety-defenses-with-a-new-bug-bounty-program

  • Stripe LLM Bounty — HackerOne campaign targeting AI misuse in Stripe docs assistant.

https://hackerone.com/stripe

  • Inflection-Pi Bounty — HackerOne program for privacy leaks and compliance failures in the Pi LLM.

https://hackerone.com/inflection

  • Indeed AI Disclosure — Bugcrowd scope for Indeed’s internal recruiting chatbot.

https://bugcrowd.com/engagements/indeed/announcements


Miscellaneous & Utilities

  • Playground InjectPrompt — Web sandbox to paste URLs and generate indirect prompt-injection payloads.

https://playground.injectprompt.com/

  • MITRE ATLAS Matrix — Attack-surface matrix mapping adversary TTPs to ML workflows.

https://atlas.mitre.org/matrices/ATLAS

  • Gray Swan Arena Chat “Agent Red Teaming” — Public room for sharing live exploits during events.

https://app.grayswan.ai/arena/chat/agent-red-teaming

  • OnionGPT (Tor) — Dark-web GPT instance with no safety filters for extreme testing.

http://oniongpt6lntsoztgylhju7nmqedlq6fjexe55z327lmxyae3nutlyad.onion/

  • AI Keys Leak Index — Searchable dump of accidentally exposed OpenAI keys on public GitHub repos.

https://ai-keys-leaks.begimher.com/

  • PortSwigger LLM Labs — Interactive labs teaching prompt-injection and jailbreaking through OWASP-style challenges.

https://portswigger.net/web-security/llm-attacks


Appendix — Extra Papers, Blogs, and Links (Full Exhaustive List)

https://arxiv.org/abs/2403.14720

https://arxiv.org/abs/2406.00799

https://arxiv.org/abs/2501.07238

https://arxiv.org/abs/2502.17424

https://arxiv.org/abs/2505.16957

https://arxiv.org/abs/2505.20162

https://arxiv.org/abs/2506.03350

https://arxiv.org/abs/2506.08872v1

https://arxiv.org/abs/2506.14682

https://arxiv.org/abs/2506.14866

https://arxiv.org/abs/2507.02737

https://arxiv.org/abs/2507.14805

https://arxiv.org/pdf/2307.15043

https://arxiv.org/pdf/2308.01990

https://arxiv.org/pdf/2410.13691

https://arxiv.org/pdf/2501.04931

https://arxiv.org/pdf/2501.18837

https://arxiv.org/pdf/2503.08195

https://arxiv.org/pdf/2506.14866

https://openreview.net/forum?id=6Mxhg9PtDE

https://openreview.net/forum?id=VSSQud4diJ

How to use this guide

This catalogue is intended to serve as a starting point for researchers, engineers and students entering the field of AI red teaming. The prompt engineering resources provide recipes and libraries for crafting prompts. The tools and datasets section contains frameworks for fuzzing, benchmarking and logging attacks. The papers and standards section summarizes seminal research and authoritative guidance such as NIST’s adversarial‑machine‑learning taxonomy and OWASP’s Top 10 list Courses and competitions offer hands‑on experience, while blog posts and videos keep practitioners up‑to‑date on current attacks, defenses and debates. Finally, miscellaneous tools include helpful scripts and dashboards for day‑to‑day experimentation.