ChatGPT-5 Downgrade Attack: How Hackers Evade AI Defenses with Simple Prompts

Diagram illustrating the PROMISQROUTE attack on AI models

Security researchers from Adversa AI have identified a critical vulnerability in ChatGPT-5 and other major AI systems, allowing attackers to evade safety measures through minor prompt modifications. This attack, named PROMISQROUTE, exploits AI routing mechanisms that prioritize cost savings over security by directing user queries to cheaper, less secure models.

When users interact with ChatGPT, they believe they are communicating with a single, consistent model. However, a complex routing system determines which model responds, often opting for the most cost-effective option rather than the most secure. PROMISQROUTE targets this routing infrastructure, enabling malicious users to route their requests through weaker models lacking robust safety training.

The attack is alarmingly simple; adding trigger phrases like “respond quickly” or “use compatibility mode” can redirect harmful requests to less protected models, such as GPT-4 or GPT-5-mini. Researchers estimate that most “GPT-5” requests are actually processed by these weaker models, allowing OpenAI to save approximately \$1.86 billion annually through this routing strategy, which compromises both business models and user safety.

NPAV offers a robust solution to combat cyber fraud. Protect yourself with our top-tier security product, Z Plus Security

Sharing is caring!
Share

Tweet LinkedIn