Tracking Every Data Subject Across Systems for DSAR Readiness | Ethyca

Manual DSAR fulfillment fails when data is fragmented across dozens of systems. Learn how personal data intelligence makes subject tracking accurate and scalable.

Authors

Ethyca Team

Topic

Regulatory

Published

Apr 22, 2026

Tracking Every Data Subject Across Systems for DSAR Readiness

In 2025, the Identity Theft Resource Center reported that 3,332 U.S. data compromises affected 278.8 million individuals. Each of those individuals holds a legal right to ask any organization that processed their personal data exactly what was collected, where it went, and why. That right manifests as a Data Subject Access Request. For most organizations, fulfilling even one of those requests means manually searching across dozens of disconnected systems, reconciling conflicting identifiers, and hoping nothing was missed.

SaaS companies now manage personal data across a sprawling landscape of systems: CRMs, analytics platforms, payment processors, marketing automation tools, customer support databases, data warehouses, and third-party integrations. Yet most cannot reliably say where a single user's information lives across that landscape. This is not a workflow gap. It is an infrastructure gap, and until organizations treat it as one, DSAR fulfillment will remain slow, expensive, and unreliable.

The Actual Scale of the DSAR Tracking Challenge

Consider what happens when a data subject submits an access request to a mid-market SaaS company. The privacy team receives the request. They open a ticket. Then they begin the manual work of contacting system owners across engineering, marketing, sales, support, and analytics to locate every record associated with that individual.

That individual might appear as an email address in Salesforce, a hashed user ID in Snowflake, a cookie identifier in Google Analytics, a phone number in Twilio, and a first-name-plus-last-name string in Zendesk. Each system stores a fragment. None of them share a common key. The privacy team must manually correlate these fragments into a single, coherent response.

Organizations have reported spending over two weeks and, according to EY, $1,400 per request when handling DSARs manually. When requests arrive in volume, DSAR fulfillment overhead compounds as data fragmentation grows across SaaS, cloud, and vendor systems. That figure also assumes every request is fulfilled correctly the first time, which is rarely the case.

IBM reported that the global average cost of a data breach reached $4.88 million in 2024. A significant portion of that cost traces back to the inability to locate and account for personal data quickly. When organizations cannot map where personal data lives, they cannot respond to incidents, fulfill subject rights, or demonstrate accountability to regulators. That cost compounds with every system added and every month of growth.

Why "What Is Personal Data?" Is an Infrastructure Question

What Constitutes Personal Data Under GDPR?

The GDPR definition of personal data under Article 4(1) is deliberately broad. It defines personal data as "any information relating to an identified or identifiable natural person." That includes names, email addresses, and phone numbers. It also includes IP addresses, device fingerprints, location data, behavioral patterns, and any combination of attributes that could identify someone.

This breadth is the point. Regulators designed the definition to capture the full scope of how organizations actually use data to identify and track individuals. But that same breadth creates a technical challenge that most organizations have not solved.

Knowing the legal definition of personal data is straightforward. Knowing where every instance of personal data actually lives across a production environment is a different undertaking entirely. A legal team can tell you that an IP address qualifies as personal data. Only infrastructure can tell you that IP addresses appear in multiple systems, stored in different formats, linked to various identifier schemes, and retained under different policies.

This is where the concept of personal data intelligence becomes concrete. Personal data intelligence is not a dashboard or a report. It is the technical capability to continuously discover, classify, and map personal data across every system in an organization's stack, requiring automated scanning that can detect personal data in structured databases, semi-structured logs, unstructured documents, and third-party SaaS platforms.

Helios provides exactly this capability. Helios performs automated discovery and classification of personal data across all connected systems, building a living inventory that reflects the actual state of data across an organization's infrastructure. It identifies sensitive personal data, maps it to data subjects, and maintains that map as systems change. Without this layer, organizations are working from static spreadsheets that go stale the moment they are created.

How to Secure Personal Data Online at the Organizational Level

The question of how to secure personal data online is often framed as a consumer concern. Use strong passwords. Enable two-factor authentication. Be careful what you share. These are valid individual practices, though most individuals have limited visibility into where their data exists.

For organizations, securing personal data online is a fundamentally different exercise. It requires knowing what personal data you hold, where it is stored, how it flows between systems, who has access to it, and what policies govern its retention and deletion. Security without inventory is guesswork, because you cannot protect what you cannot find.

This is why personal data protection at the organizational level starts with discovery and classification, not with encryption or access controls. Those controls matter, but they are only effective when applied to a complete and accurate map of where personal data actually resides.

How Current DSAR Fulfillment Approaches Break Under Real-World Complexity

Most organizations approach DSAR fulfillment through one of three methods. Each breaks at scale for specific, predictable reasons.

Manual Coordination

This is the most common approach. A privacy team receives a request, opens a ticket, and emails system owners across the organization asking them to search their systems for records matching the data subject. Each system owner runs a query, exports results in whatever format their system supports, and sends it back. The privacy team then reconciles, deduplicates, reviews for exemptions, and packages the response.

This process is linear and serial. It depends on the availability and responsiveness of every system owner involved. It assumes that each system owner knows the full schema of their system and can reliably identify all personal data within it. In practice, system owners often miss fields, overlook derived data, or fail to check secondary storage locations like backups and logs.

As the number of systems grows, this process becomes increasingly slow and error-prone, producing inconsistent results every time.

Static Data Inventories

Some organizations invest in building a data inventory: a spreadsheet or database that catalogs which systems hold which categories of personal data. This is a step forward from pure manual coordination, but static inventories degrade immediately.

Engineering teams add new systems, modify schemas, integrate new third-party services, and migrate data between platforms. A data inventory that was accurate in January is materially incomplete by March.

Static inventories also lack the resolution needed for DSAR fulfillment. Knowing that "Salesforce contains contact information" is not the same as knowing that a specific data subject's email address appears in multiple Salesforce objects, custom fields, and automated workflow logs. DSAR fulfillment requires record-level precision, not category-level awareness.

Workflow Automation Without Data Intelligence

A third approach layers automation on top of the manual process. Ticketing systems route requests to the right teams. Workflow engines track progress and deadlines. Dashboards show completion rates.

These tools optimize the coordination layer, making the manual process faster and more visible. But they do not solve the underlying infrastructure gap. If the organization cannot programmatically locate every record belonging to a data subject across every system, no amount of workflow automation will produce a complete and accurate response.

Workflow automation without data intelligence is like optimizing the mail delivery route without knowing the addresses. The packages move faster, but they still arrive at the wrong destinations.

Building Infrastructure for End-to-End Personal Data Intelligence

The alternative is to treat DSAR readiness as an infrastructure capability, not a process. This means building three technical layers that work together: automated discovery and classification, cross-system subject mapping, and programmatic fulfillment.

Automated Discovery and Classification

The first layer continuously scans every connected system to identify and classify personal data. This includes structured databases, cloud storage, SaaS platforms, data warehouses, and messaging systems. Classification must distinguish between standard personal data and sensitive personal data, such as health information, biometric data, racial or ethnic origin, and financial records. Different categories of personal data carry different regulatory obligations, and the infrastructure must reflect that distinction.

Helios performs this function across an organization's entire data ecosystem. It connects to systems via APIs and database connectors, scans schemas and data samples, applies classification models, and builds a continuously updated inventory of personal data. This inventory is not a static document but a live, queryable representation of where personal data exists right now.

Cross-System Subject Mapping

Discovery and classification tell you what personal data exists and where. Subject mapping tells you which data belongs to which individual. This is the hardest technical layer to build, because the same person appears differently across systems.

A single data subject might be represented as user_id: 48291 in a product database, jane.doe@email.com in a CRM, customer_token: abc123 in a payment processor, and cookie_id: xyz789 in an analytics platform. Subject mapping requires resolving these identifiers into a unified subject graph that connects all records belonging to the same individual.

Fides, Ethyca's open-source privacy engineering framework, provides the policy enforcement and data mapping layer that makes this resolution possible. Fides defines how personal data should be handled across systems, encodes privacy policies as executable code, and maps data flows between systems. It serves as the connective tissue between discovery and fulfillment.

Programmatic Fulfillment

Once personal data is discovered, classified, and mapped to subjects, fulfillment becomes a programmatic operation rather than a manual one. When a DSAR arrives, the system can immediately identify every record belonging to that subject across every connected system, package the data according to the request type, apply any applicable exemptions, and deliver the response.

Lethe, Ethyca's automated DSR fulfillment engine, executes this final layer. Lethe processes access requests, deletion requests, and de-identification requests across systems. It does not require manual intervention from system owners. It operates on the data map built by Helios and the policy framework defined in Fides, executing fulfillment actions directly against connected systems.

This three-layer architecture transforms DSAR fulfillment from a multi-week manual project into an automated, repeatable operation. The infrastructure does the work that previously required dozens of people and weeks of coordination.

What Is Personal Data Intelligence in Practice?

Personal data intelligence is the operational capability to answer three questions at any moment: What personal data do we hold? Where does it live? Who does it belong to?

These questions sound basic, but answering them at scale across a modern data architecture is anything but. Personal data intelligence requires continuous discovery, automated classification, real-time subject resolution, and policy-aware governance. It is not a one-time audit but a persistent infrastructure capability.

Organizations that build personal data intelligence gain more than DSAR readiness. They gain the ability to respond to regulatory inquiries with precision. They can scope the impact of a personal data breach in hours rather than months.

According to Varonis, the mean time to identify a breach globally is 194 days. Organizations with real-time data intelligence can compress that timeline dramatically, because they already know where personal data lives and who it belongs to.

Personal data intelligence also enables proactive governance. Instead of reacting to requests and incidents, organizations can continuously monitor their data landscape for policy violations, unauthorized data flows, and retention policy drift. This shifts the privacy function from reactive to structural.

What Becomes Possible When DSAR Infrastructure Is Right

When the infrastructure works, DSARs become routine. Not because they are trivial, but because the technical foundation handles the complexity that previously required manual effort. Privacy teams stop spending their time chasing data across systems and start spending it on strategic decisions about data governance, consent architecture, and privacy engineering.

Ethyca's infrastructure powers this shift for organizations today. The platform has processed over 4 million access requests and manages more than 744 million consent and preference records across 200+ brands. These numbers reflect what happens when personal data intelligence is built into the infrastructure layer rather than bolted on as a process.

Engineering teams move faster because privacy requirements are encoded in the infrastructure, not in policy documents that require manual interpretation. Privacy teams operate with confidence because they can verify, at any moment, that every system is accounted for and every subject's data is mapped. Legal teams respond to regulators with precision because the data map is always current.

The organizations that build this infrastructure do not just fulfill DSARs more efficiently. They build a foundation for every privacy operation that depends on knowing where personal data lives: consent management, data minimization, retention enforcement, breach response, and cross-border transfer governance. Each of these capabilities depends on the same underlying data intelligence layer.

DSAR readiness is not the end goal. It is the first proof point that the infrastructure is working. When an organization can reliably locate, process, and package every record belonging to any data subject across its entire system landscape, it has built something far more valuable than a compliance workflow. It has built the infrastructure for trustworthy data governance at scale.

If your organization is ready to build that foundation, speak with us.

[X Twitter][Linkedin]

[4 articles]