Large Language Model (LLM)

A Large Language Model is a neural network trained on massive corpora of text to predict the next token given preceding context. Modern LLMs (GPT, Claude, Gemini, Llama, and others) are built on the transformer architecture and run to hundreds of billions of parameters, allowing them to perform a wide range of language tasks — generation, summarization, classification, translation, code synthesis, reasoning — without task-specific training.

From a data-protection standpoint, LLMs create privacy obligations at multiple stages of their lifecycle. Training data may contain personal data scraped from the web, raising questions about lawful basis, data subject rights, and the ability to erase. Inputs at inference time — prompts, retrieved context — often contain personal data the user, employee, or customer is sharing in real time. Outputs can contain personal data, either factually correct or hallucinated, about individuals who never consented to be the subject of the output.

Treating an LLM as a "model" rather than a "data system" understates the compliance surface. Each stage above has its own retention, lawful basis, and rights story. Organizations deploying LLMs need a governance posture that covers all three: training data governance, prompt-time data minimization, and output controls.