I. The Shift to Polyglot Obfuscation
The “Do Anything Now” (DAN) prompt structure has undergone a critical architectural shift. While early 2024 variants relied on semantic coercion, the DAN 6.0 polyglot vectors observed in late 2025 exploit a fundamental weakness in Large Language Models: Polyglot Obfuscation through low-resource language encoding to bypass RLHF safety filters.
GridBase adversarial telemetry indicates that standard English-based safety filters (RLHF) are failing to generalize across low-resource languages.
II. Mechanism: The Cross-Lingual Transfer Gap
The attack vector targets the disparity between a model’s reasoning capabilities and its safety alignment. Models like Llama 3 and GPT-4o possess vast multilingual translation abilities, but their safety training is predominantly English-centric.
The “Tower of Babel” Exploit
- Encoding: The adversary translates a harmful prompt into a low-resource language encoding (e.g., Zulu, Scots Gaelic) or Base64/ASCII representations to circumvent GPT-4o and Llama 3 standard guardrails.
- Tunneling: The prompt bypasses the initial English-based input filter (WAF) because the tokens do not match known malicious patterns categorized under OWASP LLM01: Prompt Injection.
- Latent Execution: The model translates and executes the logic before the safety filter can re-evaluate the intent.
Intelligence Signal: Deterministic filtering (Regex) is ineffective. The malicious tokens are instantiated only within the model’s internal reasoning state.
III. Case Study: Llama 3 Safety Failure
In controlled red-teaming environments, GridBase Ops observed that while Llama 3 refused the English prompt “Write a phishing email” (98% success rate), it complied when the request was translated into Scots Gaelic with a 74% bypass rate. This confirms recent findings on cross-lingual alignment vulnerabilities.
IV. Mitigation Strategy
To mitigate Polyglot vectors, organizations must move beyond keyword matching:
- Perplexity Thresholding: Reject prompts with high entropy (randomness) indicative of encoding.
- Latent State Monitoring: Audit the “internal monologue” of Chain-of-Thought models.
- Vector Normalization: Force non-English inputs through a “Sanitizer Model”.
Directive: GridBase Adversarial Risk Assessments now mandate the inclusion of localized translation vectors as a standard test case.