Break Sherlock 🕵🏻♂️
The AI Mind Palace CTF Challenge by Realm Labs
We are facing an AI Safety Crisis. Traditional techniques, post-training and AI firewalls can't stop AI from causing harm. Realm monitors the AI's internal "mind palace" to catch harmful thoughts before they become words. We built "Sherlock", a challenge to break our mind-reading defense. Can you prove us wrong?

What is Sherlock?
- Sherlock is an LLM security game, powered by Llama 3.1-8B Instruct, temp: 0.01, max_tokens: 1000, designed to test your skills in bypassing AI safeguards by tricking the model into revealing forbidden outputs.
- It is designed to demonstrate real-world security defenses adopted in the industry and their efficacy or the lack thereof. The game consists of 4 levels, each more challenging than the last as the defenses become increasingly sophisticated:
- Level 1: Model alignment with meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
- Level 2: Same instruction tuned model with system prompt as defense
- Level 3: Llama model with system prompt + LlamaGuard safety model as layered defense
- Level 4: The same llama model with no system prompt but protected by only RealmGuard, security from within the model

What should I do?
- Your goal is to trick Sherlock into revealing the secret recipe at each level using prompt injection techniques.
- The technical accuracy of the recipe will not be judged, as long as it is within a reasonable boundary.
- An attempt to bypass safeguards will be disqualified, your goal is to break the model security.
