Skip to main content
Build trusted data with Ethyca.

Subject to Ethyca’s Privacy Policy, you agree to allow Ethyca to contact you via the email provided for scheduling and marketing purposes.

Glossary

Privacy and data protection glossary

Plain-English definitions of the terms that come up when you work with privacy regulation, consent, and data subject rights.

A

  • AI Agent

    An AI system that perceives its environment, plans actions, and uses tools or APIs to accomplish goals on a user's behalf. Agents introduce new data governance challenges around what data the agent can access, what actions it can take, and on whose lawful basis it operates.

  • AI Bias

    Systematic deviation in an AI system's outputs that disadvantages certain groups, typically caused by skewed training data, biased labels, or design choices. AI bias intersects with non-discrimination law and is a focus of EU AI Act high-risk system obligations.

  • AI Governance

    The discipline of setting policies, roles, controls, and oversight mechanisms for an organization's development and use of AI, covering risk, compliance, ethics, security, and data protection across the AI lifecycle.

  • AI Hallucination

    When a generative AI model produces a confident output that is factually wrong, fabricated, or unsupported by its training data or retrieved context. Hallucinations create compliance, reputational, and tort risks when AI outputs about people are inaccurate.

  • Agentic AI

    A pattern of AI deployment in which language models orchestrate multi-step tasks, call external tools, and operate with degrees of autonomy. Agentic systems can read, write, and act on personal data across many systems, requiring tighter governance than single-turn assistants.

  • Algorithmic Profiling

    Any automated processing of personal data that evaluates or predicts aspects of a person — performance, economic situation, health, preferences, behavior, location. Profiling is subject to specific GDPR rules even when it doesn't drive a fully automated decision.

  • Anonymization

    The irreversible transformation of personal data so that individuals can no longer be identified, directly or indirectly. Truly anonymized data falls outside the scope of most privacy laws.

  • Artificial Intelligence(AI)

    Systems that perform tasks typically requiring human intelligence — reasoning, perception, language understanding, decision-making — by learning patterns from data or following programmed rules. Increasingly subject to dedicated regulation including the EU AI Act.

  • Automated Decision-Making

    A decision made solely by automated means, including algorithms and AI, that produces legal or similarly significant effects on a person. The GDPR grants data subjects the right not to be subject to such decisions in most cases, with exceptions.

B

  • Binding Corporate Rules(BCRs)

    Internal data protection policies adopted by a multinational corporate group, approved by EU regulators, that legitimize personal data transfers within the group across borders without a separate transfer mechanism per pair of entities.

C

  • Consent

    A freely given, specific, informed, and unambiguous indication by the data subject that they agree to a particular use of their personal data. Consent must be as easy to withdraw as to give.

  • Consent Management Platform(CMP)

    A system that collects, stores, signals, and synchronizes a user's consent and preferences across an organization's web, mobile, and downstream marketing technologies, in line with regulatory and IAB framework requirements.

  • Context Window

    The amount of text (measured in tokens) an LLM can consider in a single request. Wider context windows allow agents to reason over larger documents and conversations, but also expand the surface for personal data exposure and prompt injection.

  • Cookie

    A small text file placed on a user's device by a website to remember information across visits, such as login state, preferences, or behavioral tracking signals. Cookies are subject to consent requirements under the GDPR ePrivacy Directive and most US state privacy laws.

D

  • Data Breach

    An incident in which personal data is accidentally or unlawfully destroyed, lost, altered, disclosed, or accessed by an unauthorized party. Most privacy laws require notification of affected individuals and regulators within tight timeframes.

  • Data Catalog

    A centralized, searchable inventory of an organization's data assets that captures metadata such as ownership, schema, lineage, classification, and access policies. A foundational data governance artifact.

  • Data Classification

    The process of labeling data according to its sensitivity, regulatory category, or business value (e.g. "public", "internal", "confidential", "PII"). Drives downstream controls for access, retention, and protection.

  • Data Controller

    The legal entity that determines the purposes and means of processing personal data. Controllers carry primary responsibility for compliance with data protection law. Also called "Controller".

  • Data Governance

    The exercise of authority and control over the management of data assets across an organization. Encompasses policies, standards, roles, processes, and tooling that ensure data is accurate, accessible, secure, and used in ways consistent with regulation and strategy.

  • Data Lineage

    The end-to-end record of where data originates, how it moves and transforms across systems, and where it is ultimately consumed. Lineage is essential for impact analysis, compliance reporting, and root-cause investigation.

  • Data Map

    A comprehensive view of the personal and sensitive data an organization collects, where it is stored, how it is used, who it is shared with, and the security measures applied. The artifact most commonly used to satisfy GDPR Article 30 RoPA reporting.

  • Data Minimization

    A core GDPR principle requiring that personal data collected is adequate, relevant, and limited to what is necessary in relation to the purposes for which it is processed.

  • Data Processing Agreement(DPA)

    A contract between a Controller and a Processor (or Processor and Sub-Processor) governing the terms of personal data processing. Required by Article 28 of the GDPR whenever processing is delegated to a third party.

  • Data Processor

    A vendor, service provider, or other party that processes personal data on behalf of a Controller. Processors act only on documented instructions from the Controller. Also called "Processor".

  • Data Protection Impact Assessment(DPIA)

    A structured process required under the GDPR to identify and mitigate risks to data subjects before undertaking high-risk processing activities such as large-scale profiling or processing of special category data.

  • Data Protection Officer(DPO)

    A designated individual responsible for overseeing an organization's data protection strategy and regulatory compliance. The DPO is mandatory under GDPR for certain organizations and reports to the highest level of management.

  • Data Quality

    The fitness of data for its intended use, measured across dimensions such as accuracy, completeness, timeliness, consistency, validity, and uniqueness. Poor data quality undermines analytics, AI models, and regulatory reporting.

  • Data Steward

    A person within an organization responsible for the quality, integrity, and appropriate use of a specific data domain or system. Data Stewards bridge business and technical teams, owning definitions, documentation, and policy enforcement.

  • Data Stewardship

    The discipline of assigning accountability for specific data assets to named owners ("Data Stewards") who are responsible for definitions, quality, classification, and policy enforcement within their domain.

  • Data Subject

    An identified or identifiable individual whose personal data is processed by an organization. Customers, employees, prospects, and website visitors are all data subjects.

  • Data Subject Access Request(DSAR)

    A formal request from an individual to receive a copy of the personal data an organization holds about them, plus information about how that data is used, shared, and retained. Organizations typically have 30 days to respond.

  • Differential Privacy

    A mathematical framework for releasing statistics or training models on data in a way that provably limits what can be learned about any individual in the dataset. The leading technical approach for privacy-preserving analytics on personal data.

E

  • EU-US Data Privacy Framework(DPF)

    A 2023 adequacy mechanism that permits transfer of personal data from the EU to US companies that self-certify compliance with the framework's principles, replacing the invalidated Privacy Shield.

  • Embedding

    A numerical representation of text, images, or other content as a vector in a high-dimensional space, where semantically similar inputs sit close together. Embeddings derived from personal data are themselves personal data and require the same governance.

  • Encryption

    The use of cryptographic algorithms to render data unreadable without a key, applied either to data in transit (e.g. TLS) or at rest (e.g. AES-256 disk encryption). A foundational technical safeguard required by most data protection laws.

F

  • Federated Learning

    A machine learning approach in which models are trained across many decentralized devices or servers holding local data, without that data ever leaving its source. Used to train AI models on personal data while reducing collection and transfer risks.

  • Fine-Tuning

    The process of further training a pre-trained model on a smaller, task-specific dataset to specialize its behavior. Fine-tuning on personal data raises retention and erasure questions because the data is partially absorbed into model weights.

  • Foundation Model

    A large AI model trained on broad data at scale, adaptable to a wide range of downstream tasks via fine-tuning or prompting. The EU AI Act places specific obligations on providers of foundation models, including documentation, transparency, and copyright disclosures.

G

  • Generative AI

    AI systems that produce new content — text, images, audio, video, code — rather than only classifying or predicting from existing data. Generative systems raise novel data protection questions because training data, prompts, and outputs may all contain personal data.

  • Global Privacy Control(GPC)

    A browser-based signal that lets users automatically communicate a "do not sell or share" preference to every website they visit. GPC is recognized as a valid opt-out signal under CCPA/CPRA and several other US state privacy laws.

H

  • High-Risk AI System

    Under the EU AI Act, a category of AI systems subject to the strictest obligations — including risk management, data governance, human oversight, and conformity assessment — because they materially affect health, safety, fundamental rights, or access to essential services. Examples include AI used in hiring, credit scoring, and law enforcement.

I

  • IAB Transparency and Consent Framework(IAB TCF)

    An industry standard maintained by IAB Europe that defines how publishers, vendors, and CMPs communicate user consent for digital advertising in a GDPR-compliant manner via a structured consent string.

J

  • Joint Controller

    Two or more organizations that jointly determine the purposes and means of processing the same personal data. Joint Controllers must agree on respective responsibilities in a written arrangement.

L

  • Large Language Model(LLM)

    A type of AI system trained on massive text datasets to generate, summarize, classify, and reason over natural language. Examples include GPT, Claude, Gemini, and Llama. LLMs ingest personal data at training time and can output personal data at inference time, creating distinct privacy obligations at each stage.

  • Lawful Basis for Processing

    One of six legal grounds under the GDPR that justifies processing personal data: consent, contract, legal obligation, vital interests, public task, or legitimate interests. Every processing activity must rest on at least one.

  • Legitimate Interest

    A lawful basis under GDPR that allows processing where the controller (or a third party) has a real interest, the processing is necessary to achieve it, and the interest is not overridden by the data subject's rights and freedoms.

M

  • Model Card

    A standardized document that summarizes an AI model's intended use, training data, performance, limitations, and ethical considerations. Originally proposed by Google researchers; increasingly expected as part of AI documentation under emerging regulations.

  • Model Context Protocol(MCP)

    An open standard developed by Anthropic that defines how AI applications connect to external tools, data sources, and APIs. MCP makes the boundary between a model and its data sources explicit and auditable, which is foundational for AI data governance.

N

  • NIST AI Risk Management Framework(NIST AI RMF)

    A voluntary framework published by the US National Institute of Standards and Technology that provides organizations with a structured approach to mapping, measuring, and managing AI risks across the lifecycle. Sometimes referenced as "AI RMF 1.0".

P

  • Personal Data

    Any information relating to an identified or identifiable person, including name, identification number, online identifiers, location data, or factors specific to a person's physical, economic, cultural, or social identity.

  • Personally Identifiable Information(PII)

    Any data that could potentially identify a specific individual. The US-centric term that most closely aligns with the GDPR's broader concept of "personal data", though PII typically has a narrower scope.

  • Privacy Impact Assessment(PIA)

    The general practice of evaluating how a project, system, or technology may affect the privacy of individuals, and identifying mitigations. DPIA is the GDPR-specific form; PIA is the broader, jurisdiction-neutral term.

  • Privacy by Design

    A framework requiring privacy and data protection to be embedded into the architecture of systems, products, and processes from the start, rather than added afterwards. Codified as a legal requirement under GDPR Article 25.

  • Prompt Engineering

    The practice of designing input prompts to LLMs to produce reliable, accurate, and safe outputs. Beyond performance, prompts are a governance surface: system prompts may carry policy constraints, redactions, or jurisdictional rules.

  • Prompt Injection

    An attack in which adversarial content placed in user input or retrieved data manipulates an LLM into ignoring its instructions, leaking data, or taking unintended actions. The leading risk class for agentic AI and a key control point in AI governance.

  • Protected Health Information(PHI)

    Any individually identifiable health information held or transmitted by a HIPAA-covered entity or business associate. Includes medical records, billing information, and any health data linked to a person.

  • Pseudonymization

    A technique that replaces direct identifiers in a dataset with reversible tokens, so that re-identification requires additional information held separately. Pseudonymous data is still personal data under the GDPR.

  • Purpose Limitation

    A core GDPR principle requiring that personal data be collected for specified, explicit, and legitimate purposes, and not further processed in a manner incompatible with those purposes.

R

  • Record of Processing Activities(RoPA)

    A formal inventory required by Article 30 of the GDPR documenting an organization's data processing activities, including purposes, categories of data, recipients, retention periods, and security measures.

  • Red Teaming (AI)

    Structured adversarial testing of an AI system, where evaluators deliberately try to elicit unsafe, biased, or unintended outputs to surface risks before deployment. A required practice for foundation models under the EU AI Act.

  • Retrieval-Augmented Generation(RAG)

    A pattern that retrieves relevant documents from a knowledge base at query time and provides them to an LLM as context, grounding the response in source data. RAG keeps proprietary or personal data out of the model weights while still allowing the model to reason over it.

  • Right to Data Portability

    The right of a data subject to receive their personal data in a structured, commonly-used, machine-readable format, and to transmit it to another controller without hindrance.

  • Right to Erasure

    The right of a data subject to require an organization to delete their personal data, subject to specific exceptions such as legal obligations or freedom of expression. Also known as the "Right to be Forgotten".

  • Right to Object

    The right of a data subject to object to certain types of processing, including direct marketing and processing based on legitimate interests. Objection to direct marketing is absolute; other objections require a balancing test.

  • Right to Rectification

    The right of a data subject to have inaccurate personal data about them corrected, or incomplete data completed, by the organization processing it.

  • Right to Restrict Processing

    The right of a data subject to require an organization to pause use of their data while a complaint or correction is resolved, without requiring outright deletion.

S

  • Sensitive Personal Information(SPI)

    A subset of personal data that receives heightened protection under CCPA/CPRA and other US privacy laws. Includes precise geolocation, race, religion, biometrics, health data, and contents of private communications.

  • Special Category Data

    Categories of personal data under the GDPR that require additional protection, including racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic and biometric data, health data, and data about sex life or sexual orientation.

  • Standard Contractual Clauses(SCCs)

    Pre-approved contract templates published by the European Commission that allow personal data to be transferred from the EEA to third countries without an adequacy decision, by binding both parties to specified data protection commitments.

  • Storage Limitation

    A core GDPR principle requiring that personal data is kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which it is processed.

  • Sub-Processor

    A third party that a Processor engages to help fulfill its processing obligations to a Controller. Sub-processors typically require the Controller's documented prior authorization.

  • Synthetic Data

    Artificially generated data designed to statistically resemble real data without belonging to any actual individual. Used to train AI models or test systems while reducing exposure of real personal data — though synthetic data can still leak information about the source dataset.

T

  • Training Data

    The dataset used to train a machine learning model, which the model learns patterns from. Where training data contains personal data, organizations must establish a lawful basis, manage retention, and consider data subject rights against the trained model.

  • Training Data Governance

    The discipline of cataloging, classifying, and controlling the personal and proprietary data used to train, fine-tune, and evaluate AI models, including lawful basis for use, retention, and the right to have data removed. Required by the EU AI Act for high-risk systems.

  • Transfer Impact Assessment(TIA)

    An assessment required following the Schrems II ruling to evaluate whether personal data transferred outside the EEA receives a level of protection essentially equivalent to that guaranteed by EU law, and identify any supplementary measures needed.

U

  • Universal Opt-Out Mechanism(UOOM)

    A technical standard (such as GPC) that allows consumers to express an opt-out of sale or sharing of their personal information once, in a way that is automatically honored across participating businesses. Required under CCPA/CPRA, Colorado, Connecticut, and others.

V

  • Vector Database

    A database optimized for storing and searching high-dimensional vector embeddings, typically used to power semantic search and RAG over text, images, or other content. Vector stores can contain personal data and so fall within scope of most data protection laws.