Prompt-injection defence
Bug reports are user-supplied content that we feed into LLMs. That makes prompt injection a first-class threat.
Defenses
- Vision-safe sanitizer — every text field, console line, and
screenshot OCR pass goes through
sanitizeForLLM()(in_shared/sanitize.ts) which strips known injection patterns (“ignore previous instructions”, system-prompt mimicry, base64-decoded instructions, control-char escapes). - Structural prompts — user content is always wrapped in
<user_report>…</user_report>tags; the system prompt instructs the model to treat anything inside the tags as data. - Output schemas — every classifier returns Zod-validated JSON. Free-form prose responses are rejected and re-prompted once.
- Vision-channel sanitization — screenshots are passed through a pixel-level OCR scrub for embedded instruction text before being forwarded to multimodal models.
- Regression suite —
packages/server/tests/injection.test.tscontains the OWASP LLM01 prompt-injection corpus. CI fails the build if any new injection slips pastsanitizeForLLM().
Last updated on