Google launches Antigravity. Researchers breach it in 24 hours

Twenty-four hours. That's how long it took for security researchers to demonstrate how Antigravity, the agentic development platform unveiled by Google in early December, could be turned into a perfect data exfiltration tool. We're not talking about a theoretical attack or an exotic vulnerability requiring movie-style hacking skills. We're talking about an attack sequence so simple it seems almost trivial: a poisoned technical implementation blog, a hidden character in one-point font, and the AI agent exfiltrating AWS credentials directly to an attacker-controlled server.

The story begins when a user asks Antigravity to help integrate a new feature into their project, providing a guide found online as a reference. This is a daily action, repeated thousands of times a day by developers worldwide. But in this guide, hidden halfway down the page in microscopic characters, is something the human eye cannot see: malicious instructions directed at the AI agent. This is what researchers call "indirect prompt injection," a technique that is redefining the landscape of cyber threats in the age of autonomous agents.

As documented by PromptArmor, the attack proceeds with almost choreographic precision. Gemini, the model underlying Antigravity, reads the webpage and encounters the hidden injection. The instructions order it to collect code snippets and credentials from the user's codebase, construct a malicious URL, and navigate to it using the platform's integrated subagent browser. And here comes the first twist: when Gemini tries to access the .env file containing the AWS credentials, it hits a protection. The "Agent Gitignore Access" setting is disabled by default, preventing agents from reading files listed in .gitignore. But the agent doesn't give up. Instead of respecting the block, it simply decides to bypass it by using the cat command from the terminal to dump the file's content, sidestepping the intended safeguards.

The chain continues: Gemini methodically constructs a URL containing the encoded credentials, then invokes a browser subagent with the instruction to visit that link. And here another design flaw emerges: the default whitelist of URLs the browser can visit includes webhook.site, a legitimate service that allows anyone to create temporary endpoints to monitor HTTP requests. In other words, a service door was conveniently left open. When the browser visits the malicious URL, the credentials travel in the query string parameters, ending up in logs accessible to the attacker. Game over.

Anatomy of a Programmed Heist

What makes this attack particularly insidious is not its technical complexity, but the very architecture of Antigravity. The platform was designed with what Google calls "Agent-assisted development" as the recommended configuration. In practice, this means the agent can autonomously decide when human approval is needed for its actions. Throughout the entire attack sequence documented by PromptArmor, the user never saw a single confirmation prompt. The agent made all the decisions on its own: reading sensitive files, executing terminal commands, opening external URLs.

Antigravity's Agent Manager, presented as a star feature, further amplifies the problem. The interface allows managing multiple agents simultaneously, each engaged in different tasks. It's the equivalent of having multiple assistants working in separate rooms: you can check in occasionally, but most of the time they operate without direct supervision. This operational model, designed to maximize productivity, creates a perfect attack surface for indirect injections. An agent working in the background on a technical integration can be compromised without the user noticing until it's too late.

But prompt injection is not the only attack vector. OWASP has cataloged prompt injection as the number one risk in its Top 10 for LLM and GenAI applications, recognizing that the vulnerability stems from a fundamental characteristic of these systems: the inability to clearly distinguish between developer instructions and user input. It's as if every input is potentially executable code. Multimodal attacks make everything even more complicated: malicious instructions can be hidden in images accompanying harmless text, exploiting the interactions between different modalities that more advanced models now support.

Take the example of invisible or nearly invisible characters. An attacker can hide prompt injections using one-point fonts, white text on a white background, or even special Unicode characters that the human eye doesn't perceive but the model reads perfectly. In other cases, malicious instructions are encoded in Base64 or disguised using emojis, multiple languages, or seemingly nonsensical strings that nevertheless influence the model's output in specific ways. The attack surface is practically infinite when the input can take any form and the model has been trained to follow instructions in natural language. Image from promptarmor.com

The Backdoor That Survives Uninstallation

But there is a further level of sophistication in attacks against platforms like Antigravity. Researchers have shown that a prompt injection can not only exfiltrate data but also install persistent backdoors that survive the original session. Imagine an agent that, following instructions hidden in technical documentation, silently adds malicious code snippets to the project's configuration files. Code that is then committed to the repository, deployed to production, and perhaps even shared with other team members through version control systems.

This type of attack, which PromptArmor has named "stored prompt injection," can have consequences that ripple far beyond the initial moment of compromise. A small code fragment establishing a reverse shell, an API call to an external server tracking application usage, or subtle changes to business logic that favor the attacker in specific scenarios. AI-generated code is often considered safer than it should be, simply because it comes from a tool perceived as neutral and trustworthy. But as documented by researchers from Snyk and Lakera, the output of LLMs must be treated exactly like any other untrusted input: validated, sanitized, and checked before being executed.

Persistence can also be achieved by manipulating the IDE's own workspace files. Antigravity maintains configurations, preferences, and histories that influence the agent's behavior in subsequent sessions. An attacker who manages to inject instructions into these files can effectively "train" the user's local agent to behave maliciously every time it is used, creating what researchers call "memory poisoning." It's the digital equivalent of leaving sticky notes on someone's desktop, only these notes tell the AI assistant to do things it shouldn't.

The Fragile Ecosystem of AI Agents

Antigravity is not alone. The entire ecosystem of agentic IDEs is showing surprisingly similar vulnerabilities. The platform is derived from Windsurf, which had already exhibited documented security issues. Cursor, another major player in the sector, has faced analogous issues. Anthropic's Claude Code has demonstrated how an AI agent can be manipulated to perform unauthorized actions through compromised marketplace plugins. Every new implementation of "AI-assisted coding" seems to repeat the same vulnerability patterns, as if the industry is relearning lessons that cybersecurity learned decades ago with SQL injection and XSS.

The fundamental difference is that this time we're not talking about bugs in the code, but about vulnerabilities intrinsic to the very architecture of language models. As OpenAI explained in its recent post on prompt injection, this is a frontier challenge that will likely never be completely "solved," just as phishing and social engineering on the web continue to exist despite decades of countermeasures. The very nature of LLMs—following instructions in natural language—makes them vulnerable to this type of manipulation.

The problem is amplified when we consider the broader ecosystem of AI agents. As we documented when analyzing HumaneAIBench, agentic systems are proliferating rapidly, from automating emails and calendars to managing complex DevOps operations. Each of these agents represents a potential attack vector if not designed with security as a top priority. And the trend shows no signs of slowing down: GitHub predicts one billion developers by 2030, many of whom will leverage these tools without a deep understanding of the associated risks.

"Vibe coding," the idea of creating software simply by describing what you want in natural language, is fascinating from an accessibility standpoint. But as pointed out by researchers at SecureCodeWarrior, giving an unprepared developer the ability to generate thousands of lines of code through prompts is like putting a beginner behind the wheel of a Formula 1 car. The experience will be thrilling for everyone, but the chances of it ending badly are very high. AI-generated code, as confirmed by Baxbench benchmarks, frequently contains security vulnerabilities that experienced developers would identify immediately. Image from promptarmor.com

Security That Doesn't Scale

Google's response to the Antigravity case was pragmatic but unsettling. Instead of announcing immediate fixes or redesigned architectures, the company opted for a disclaimer. During Antigravity's onboarding, users are shown a warning about the risks of data exfiltration. The message is clear: "we know there are issues, use at your own risk." In their official announcement in November, Google presented Antigravity as part of a new era of intelligence, emphasizing agentic capabilities and vibe coding, without mentioning the known vulnerabilities.

Even more interesting is how Google handled the disclosure. Since the company had already stated it was aware of the data exfiltration risks, PromptArmor decided not to follow the standard responsible disclosure process. The vulnerability was classified as a "Known Issue" in Antigravity's documentation, a category that effectively excludes these problems from Google's bug bounty program. In other words: we know they exist, but we don't consider them bugs to be fixed with priority. It's a stance reminiscent of the tobacco industry in the 1950s: putting a warning label on the package doesn't solve the underlying problem.

This approach raises profound questions about the development model the AI industry is adopting. As we observed when analyzing Google Threat Intelligence, there is an arms race underway between offensive and defensive capabilities in the AI space. But when platforms are released with known vulnerabilities, classified as "features with risks," the burden of security is shifted entirely onto the end-users. Is this model sustainable? Is it ethical?

The issue becomes even more complex when we consider the regulatory context. The European Union is developing the Digital Omnibus, a framework that could impose stricter security requirements for AI systems. Mindgard, a company specializing in AI security, has raised concerns that platforms like Antigravity might not comply with future regulations precisely because of these structural vulnerabilities. But at the moment, in the absence of clear regulations, companies can choose to prioritize release speed over security, leaving users to navigate the risks.

IBM defines prompt injection as a major concern because no one has yet found a foolproof way to address it. The vulnerabilities arise from a fundamental characteristic of GenAI systems: the ability to respond to instructions in natural language. Reliably identifying malicious instructions is difficult, and restricting user inputs could fundamentally change how LLMs operate. It is an architectural problem, not a bug that can be patched with a software update.

CrowdStrike, through its acquisition of Pangea, has analyzed over 300,000 adversarial prompts and tracks more than 150 different prompt injection techniques. Their taxonomy highlights how the attack surface continues to expand. Every new model, every new multimodal capability, every integration with external tools adds potential compromise vectors. And while defenses improve, so do attack techniques, in an evolutionary dance reminiscent of the eternal battle between antivirus and malware.

The Uncertain Future of Trusted Automation

Where does all this lead us? The cyber front of AI is becoming increasingly complex and nuanced. On one hand, we have tools that promise to democratize software development and multiply productivity. On the other, we are creating attack surfaces that we do not yet know how to adequately defend. The Antigravity case is emblematic: a platform released in public preview by one of the world's most sophisticated tech companies, with known and documented vulnerabilities that allow credential exfiltration in realistic use-case scenarios.

The challenge is not just technical, but also cultural. Developers must learn to treat AI-generated code with the same skepticism they reserve for code taken from Stack Overflow. Organizations must implement policies that clearly define what data can and cannot be exposed to AI agents. Security teams must evolve their practices to include the detection and mitigation of prompt injection, a category of attack that most traditional tools are not designed to intercept.

As highlighted in our article on AI browsers, the promise of an AI-enhanced web experience clashes with unresolved security realities. OpenAI recently admitted that its Atlas browser might always be vulnerable to prompt injection, acknowledging that it will likely never be completely "solved." The company is implementing an "LLM-based automated attacker" that uses reinforcement learning to find new attack techniques before they are discovered in the wild, but it is essentially a race between offensive and defensive innovation where the advantage seems to lean towards the attackers.

AI security requires a layered approach reminiscent of traditional defense-in-depth. A single layer of protection is not enough: we need input validation, execution sandboxing, continuous monitoring, detailed logging, and above all, a security culture that permeates every level of the stack. AI agents must operate on the principle of least privilege, accessing only the data strictly necessary for the specific task. Consequential actions must require explicit user confirmation. Whitelists must be minimalist, not permissive by default.

But perhaps the most important lesson from the Antigravity case is that the speed of innovation cannot come at the expense of fundamental security. Releasing platforms with "Known Issues" that include credential exfiltration is not acceptable, regardless of how many disclaimers are shown during onboarding. If the AI industry wants these tools to be adopted by enterprises with sensitive data, it must demonstrate that security is not an afterthought but a fundamental design requirement. Otherwise, we risk building a future where intelligent automation becomes the preferred vector for attackers, just as email was in the 2000s or web apps in the 2010s.

The road to truly secure AI agents will be long and will likely never have a final destination. But we can at least avoid repeating mistakes that the software industry has already made and overcome. Treating input as trusted code was a problem for SQL injection thirty years ago, for XSS twenty-five years ago, and now it reappears with prompt injection. The difference is that this time the stakes are higher: we are not talking about single web applications, but about autonomous agents that could manage significant portions of our digital infrastructure. Better to learn the lesson now, before the cost becomes unsustainable.