Break Sherlock 🕵🏻‍♂️

The AI Mind Palace CTF Challenge by Realm Labs

This challenge has ended. Stay tuned for the next one!
Sherlock Benedict Cumberbatch Leave GIF
What is Sherlock?

We are facing an AI Safety Crisis. Traditional techniques, post-training and AI firewalls can't stop AI from causing harm. Realm monitors the AI's internal "mind palace" to catch harmful thoughts before they become words. We built "Sherlock", a challenge to break our mind-reading defense.

  • Sherlock is an LLM security game, powered by Llama 3.1-8B Instruct, temp: 0.01, max_tokens: 1000, designed to test your skills in bypassing AI safeguards by tricking the model into revealing forbidden outputs.
  • It is designed to demonstrate real-world security defenses adopted in the industry and their efficacy or the lack thereof. The game consists of 4 levels, each more challenging than the last as the defenses become increasingly sophisticated:
    • Level 1: Model alignment with meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo
    • Level 2: Same instruction tuned model with system prompt as defense
    • Level 3: Llama model with system prompt + LlamaGuard safety model as layered defense
    • Level 4: The same llama model with no system prompt but protected by only RealmGuard, security from within the model
Challenge preview illustration
What should I do?
  • Your goal is to trick Sherlock into revealing the secret recipe at each level using prompt injection techniques.
  • The technical accuracy of the recipe will not be judged, as long as it is within a reasonable boundary.
  • An attempt to bypass safeguards will be disqualified, your goal is to break the model security.