Preparing Your Data Stack for the Coming Wave of AI Regulation

The regulatory volume around AI is accelerating — 59 US federal rules in 2024, 131 state laws, EU AI Act obligations taking full effect in 2026. The question isn't whether AI regulation is coming. It's whether your data stack can absorb it. Most can't, because the infrastructure underneath AI systems was never designed to answer what regulators now ask: what data trained this model, which users consented, and can you prove it. That's an infrastructure problem, not a legal one.

Authors

Ethyca Team

Topic

AI & Policy

Published

May 28, 2026

Preparing Your Data Stack for the Coming Wave of AI Regulation

Key takeaways

According to Stanford HAI's 2025 AI Index Report, US federal agencies issued 59 new AI-related regulations in 2024, more than double the year prior. US states passed 131 AI laws in the same period, up from 49 in 2023. Across 75 countries, legislative mentions of AI grew 21.3% year over year.
The question is whether your data stack can absorb AI regulation, without requiring a manual re-architecture each time a new law takes effect.
Most stacks cannot, because the infrastructure underneath AI systems was never designed to answer the questions regulators now ask: what data trained this model, which users consented to that use, and can you prove it.
The regulatory challenge is one of data infrastructure: the ability to classify AI systems, map their training data, enforce purpose-specific consent, and produce auditable records at the pipeline level.
Organizations that treat AI regulation as an infrastructure design problem will absorb new requirements as configuration changes. Those that treat it as a legal exercise will re-architect their systems with every new law.

In 2024 alone, US federal agencies issued 59 new AI-related regulations and enacted 131 state-level AI laws, more than double the previous year's totals, according to Stanford HAI's 2025 AI Index Report. Across 75 countries, legislation mentioning AI grew 21.3% year over year. The EU AI Act is bringing its full obligations for high-risk AI systems into force in 2026.In the United States, no federal AI regulation bill has unified the patchwork, so states are filling the vacuum on their own.

For engineering and privacy leaders at SaaS companies, the question is no longer whether AI regulation is coming. It is whether their data stack can absorb it.

Most cannot. Not because teams lack awareness of the regulatory landscape, but because the infrastructure underneath their AI systems was never designed to answer the questions regulators are now asking: What data trained this model? Which users consented to that use? Can you prove it?

Why current approaches fail

Consider what happens when a mid-stage SaaS company receives notice that its AI-powered recommendation engine falls under the EU AI Act's high-risk classification. The compliance team needs to produce documentation covering the model's training data sources, data quality measures, human oversight mechanisms, and transparency disclosures. They also need to demonstrate that users in covered jurisdictions provided appropriate consent for their data to be used in model training.In most organizations, this information lives in at least four different systems. Training data lineage sits in an ML platform. Consent records live in a consent management tool. Data inventories, if they exist at all, are maintained in a separate spreadsheet or wiki. And the model's deployment configuration is managed by the engineering team in yet another system.No single system of record connects these layers. The result is a manual, cross-functional scavenger hunt that takes weeks and produces documentation that is already stale by the time it is assembled.This fragmentation compounds when multiple regulations interact. The EU AI Act requires transparency about AI-generated content. Switzerland's revised Federal Act on Data Protection imposes criminal penalties of up to CHF 250,000 on individuals for violations, even though Switzerland has not yet adopted its own AI-specific act. US state laws require bias audits and automated decision-making disclosures. Each regulation asks a slightly different question about the same underlying data.

Requirements of AI regulators

Understanding the EU AI Act's structure is the starting point for any infrastructure response. The Act establishes a tiered, risk-based classification system. Unacceptable-risk systems, such as social scoring by governments, are banned.

High-risk systems, including those used in employment, credit scoring, law enforcement, and critical infrastructure, face the most extensive obligations: conformity assessments, technical documentation, data governance requirements, human oversight mandates, and ongoing monitoring. Limited-risk systems must meet transparency obligations, particularly around AI-generated content. Minimal-risk systems face no specific requirements beyond voluntary codes of conduct.

For organizations operating across both the EU and Switzerland, the regulatory picture is layered. The EU AI Act and GDPR operate as complementary but distinct frameworks on the EU side. Switzerland's FADP shares structural similarities with GDPR but diverges on enforcement: it imposes criminal liability on individuals rather than organizational fines, and its automated decision-making provisions under Article 21 function as disclosure obligations rather than prohibitions.

Organizations serving both markets need infrastructure capable of mapping the same data processing activities to different regulatory taxonomies simultaneously.

In the United States, the absence of a unified federal AI law has produced a state-level patchwork that is expanding with each legislative session. Colorado's AI Act imposes requirements on developers and deployers of high-risk AI systems. New York City's Local Law 144 requires bias audits for automated employment decision tools. Illinois, Texas, and Washington have biometric privacy laws that intersect directly with AI training data practices. A recommendation engine used in hiring may require a bias audit in New York, a transparency disclosure in Colorado, and different obligations still in other states, while remaining subject to general privacy laws everywhere.

The practical effect is that organizations must first classify every AI system they operate, then apply the correct set of obligations to each, then demonstrate compliance with documentation that is specific, current, and auditable. Without automated infrastructure to do this, even the classification step becomes a bottleneck.

Where the documentation gap lives

In most organizations, this information lives in at least four different systems. Training data lineage sits in an ML platform. Consent records live in a consent management tool. Data inventories, if they exist at all, are maintained in a spreadsheet or wiki. The model's deployment configuration is managed by the engineering team in another system entirely. No single system of record connects these layers. The result is a manual, cross-functional effort that takes weeks and produces documentation that is already stale by the time it is assembled.

This fragmentation compounds when multiple regulations interact. The EU AI Act requires transparency about AI-generated content. Switzerland's FADP imposes individual criminal liability for data protection violations. US state laws require bias audits and automated decision-making disclosures. Each regulation asks a slightly different question about the same underlying data, and each query that cannot be answered automatically must be answered manually.

Four infrastructure requirements for an AI regulation-ready data stack

The infrastructure required to meet AI regulation at scale has four layers, each addressing a specific gap that manual processes cannot close.

1.Automated data inventory and classification

You cannot govern what you cannot see. The first requirement is a continuously updated inventory of every AI system, its training data sources, its inputs and outputs, and the data categories involved. This inventory must be automated because AI systems change frequently: new training data is ingested, models are retrained, and deployment configurations shift. A data map that was accurate at the last annual review may be materially incomplete by the time an auditor asks for it.

The inventory must be granular enough to answer the specific questions each regulation asks. The EU AI Act's data governance requirements under Article 10 require documentation of data collection methodologies, data processing operations, and potential biases. NYC Local Law 144 requires bias audit results tied to specific training datasets. Neither can be satisfied with a static diagram of the data stack.

2.Machine-readable policy enforcement

Identifying data is necessary but not sufficient. The second layer translates regulatory obligations into enforceable, machine-readable policies. This is where most organizations stall. They have legal interpretations of what the EU AI Act or a US state law requires, but no mechanism to encode those interpretations into technical controls that execute automatically.

The core architectural principle is that regulatory obligations should function as configuration, not custom code. A policy that specifies biometric data may not be used for model training without explicit consent in a covered jurisdiction should be an enforceable rule evaluated against every relevant data flow, not a paragraph in a compliance manual that developers may or may not consult. When the regulation changes, the policy should update in one place and propagate across the stack.

3.Granular consent orchestration

AI regulation in both the EU and the US increasingly requires granular consent for specific data uses. The EU AI Act's transparency requirements interact with GDPR's consent framework. Several US state laws require opt-out mechanisms for automated decision-making. Switzerland's FADP adds its own consent requirements.

A single user's data might be used for personalization, model training, and automated decision-making, each of which may require separate consent under different jurisdictions. Managing this at the individual user level, across millions of users and multiple jurisdictions, requires consent orchestration infrastructure that operates at the data processing level.

When a user in Germany opts out of model training but a user in Texas has no such right under current law, the system must apply the correct consent state to each data flow automatically.

4.Real-time AI policy enforcement

The final layer is enforcement. Policies that exist only in documentation are not operational. The question regulators will ask is whether a non-compliant AI workflow can be detected and blocked before it executes, not whether the organization's policy documents describe the correct behavior.

This means embedding policy checks into data pipelines as automated gates, not as manual review steps. If a model training pipeline attempts to ingest data from a restricted category, the system intercepts the operation. If a deployment configuration violates a transparency obligation, the system flags it before the model reaches production. The distinction between governance as documentation and governance as infrastructure is the difference between an audit artifact and a prevented violation.

The regulatory calendar is not slowing down

The legislative trajectory in every major jurisdiction points toward more regulation, not less. The EU AI Act's high-risk system obligations take full effect in 2026. Multiple US states have AI regulation bills moving through their legislatures with 2026 effective dates. The OECD, UN, EU, and African Union have all issued AI governance frameworks in the past two years. Regulatory sandboxes for AI are operational across the EU, Utah, Singapore, and the UAE, signaling that enforcement frameworks are maturing alongside the laws.

Organizations that wait for regulatory certainty before investing in infrastructure will find themselves in the same position as those that waited for GDPR enforcement before building privacy programs. The cost of retrofitting is consistently higher than the cost of building correctly from the start. Every AI system integrated without data lineage tracking, every model trained without purpose-specific consent records, and every deployment lacking an auditable governance trail adds to a compliance debt that compounds with each new regulation.

How Ethyca addresses AI policy enforcement

When AI governance is built into the data stack rather than bolted on as a compliance layer, the operational dynamics change fundamentally.

New regulations become configuration changes. When a US state passes a new AI regulation bill, the response is to update a policy definition in Fides and propagate it across the stack. When the EU AI Act's full obligations take effect, the documentation and transparency requirements are already being generated automatically from the data inventory that Helios maintains.

User rights requests scale without scaling headcount. Lethe provides automated DSR fulfillment and data de-identification that extends to AI training data. When a user exercises their right to deletion under GDPR or a US state law, the system can trace that user's data through model training pipelines and execute the appropriate action, whether that is deletion, de-identification, or exclusion from future training runs.

Teams can move quickly because they are operating within clearly defined boundaries that are technically enforced, not just documented in policy manuals. This kind of infrastructure enables AI innovation rather than constraining it.

Ethyca's infrastructure operates at this scale across more than 200 brands, handling consent orchestration, access request fulfillment, and policy enforcement as unified, automated workflows. When governance is treated as infrastructure, it scales with the organization rather than against it.

The organizations that will navigate the coming wave of AI regulation most effectively are not the ones with the largest legal teams or the most detailed compliance checklists. They are the ones whose data stacks were built to answer the questions regulators ask, automatically, accurately, and at the speed the business requires. Speak with us to see how Ethyca works.

Frequently asked questions

What is AI regulation and why does it matter for data teams?

AI regulation governs how AI systems are built, trained, deployed, and monitored. It matters for data teams because most obligations, documenting training data provenance, enforcing purpose-specific consent, and demonstrating human oversight, are infrastructure requirements. Legal teams interpret the statute. Data and engineering teams build the systems that satisfy it.

Which countries have AI regulation in 2025?

Most major jurisdictions. The EU AI Act applies to any organization deploying AI that affects EU residents. The US has no federal law but 131 state-level AI laws passed in 2024. Switzerland's FADP covers AI data processing directly. Across 75 countries, legislative mentions of AI grew 21.3% year over year.

What does the EU AI Act require from organizations?

Obligations depend on risk classification. High-risk systems used in employment, credit scoring, law enforcement, and critical infrastructure require conformity assessments, technical documentation, data governance records, and human oversight. The full set of high-risk obligations applies from August 2026. Organizations using personal data for AI must satisfy both the AI Act and GDPR simultaneously.

How does AI regulation affect model training data?

Directly. Regulations require organizations to document where training data came from, what consent authorized its use, and whether that authorization covered the specific AI purpose. A user who consented to behavioral analytics did not necessarily consent to model training. The EU AI Act's Article 10, GDPR's purpose limitation principle, and several US state laws all attach obligations specifically to training data.

How should engineering teams prepare for AI regulation?

Build infrastructure that treats regulatory obligations as configuration, not custom code: automated data inventories, machine-readable policy enforcement at the pipeline level, and audit trails documenting every data input to every model decision. Teams that build this once absorb new regulations by updating policy definitions. Teams that respond manually rebuild the same capabilities with every new law.

[X Twitter][Linkedin]

[4 articles]