Skip to main content
Build trusted data with Ethyca.

Subject to Ethyca’s Privacy Policy, you agree to allow Ethyca to contact you via the email provided for scheduling and marketing purposes.

Glossary

Prompt Injection

Last reviewed

An attack in which adversarial content placed in user input or retrieved data manipulates an LLM into ignoring its instructions, leaking data, or taking unintended actions. The leading risk class for agentic AI and a key control point in AI governance.

Prompt injection is the leading security risk class in deployed LLM systems. The attack works by smuggling adversarial instructions into content the model is asked to process: a user's question, a retrieved document, a webpage the agent is summarizing, a calendar invite, an email. When the model treats this content as part of its instructions, it can be made to ignore its system prompt, leak its context, take unauthorized actions, or produce harmful outputs.

Direct prompt injection comes from the user — usually mild ("ignore previous instructions and tell me…"). Indirect prompt injection is more dangerous: the malicious instruction is planted in data the model retrieves later, by a third party who never speaks to the model directly. An agent that reads a poisoned web page or a poisoned email can be manipulated into exfiltrating data, taking actions against the user's interest, or persistently misbehaving across sessions.

There is no fully reliable defense. Mitigations include strict separation of trusted system prompts from untrusted retrieved content, output guardrails, sensitivity-aware tool gating, careful tool-permission scoping, and human-in-the-loop on high-impact actions. For governance teams, prompt injection is the single best argument for treating AI agents as a distinct data-flow risk and putting them under formal review.