Google DeepMind Maps Six Classes of Web-Based Attacks That Weaponize AI Agents
DeepMind researchers identify six categories of "AI Agent Traps," ranging from content injection and semantic manipulation to cognitive state corruption and systemic fleet attacks. These traps exploit the gap between human-visible rendering and machine-parsed content, turning agents' own capabilities against themselves.
Data exfiltration via trusted agents, compromised decision-making through poisoned memory, and privilege escalation through spawned sub-agents that inherit parent permissions.