Skip to content
Data Monitoring

Data Mapping: Monitoring Data Sources

4minFidesData MappingVideoInteractive
This tutorial requires Fides Cloud or Fides Enterprise. For more information, talk to our solutions team. (opens in a new tab)

Introduction

Fides Detection & Discovery (D&D) tracks the updates to your organization’s data architecture. When a data source is updated, Fides will report the added, removed, or changed fields as well as suggest data categorizations for those fields.

This tutorial walks through the entire data pipeline, starting from configuration of the dataset, moving through promoting and tagging fields, and finally showing the resulting dataset.

Glossary of Terms

Detection & Discovery (D&D) introduces a few new concepts that are important to understand before you begin:

  • Integrations host the connection details to your data store and the configuration for data classification parameters.
  • Monitors determine the scope and schedule of detection and discovery tasks
  • Scans are the tasks that Fides executes to detect new or updated data
  • Staged resources are the database assets that are detected by a Scan before they are promoted to a Dataset
  • Monitored resources are the staged resources that have been selected to be promoted to a Dataset and will be classified by the Fides classifier
  • Fides classifier: An collection of models that assigns data categories to fields

Connecting to a Data Store

Navigate to the Integrations tab. Click Add Integration, select the integration type, and provide connection details.

Set up an integration

For some data stores, Fides will require a database administrator set up a Fides service account within the datastore with the necessary permissions. This allows Fides to run SELECT, UPDATE, or DELETE queries according to the monitor and privacy request requirements. For instructions on how to configure the service user, click the Details button next to the integration during setup.

Integration details

After providing Fides the connection details, click the Test connection button to ensure the integration is configured correctly. Once the connection succeeds, within the integration, click Add Monitor. This monitor controls the scheduling of a Scan. Each Scan that Fides executes checks for any new child assets to the monitored resources.

Add a monitor

Data Detection

After completing a Scan, staged resources appear in the Data detection tab under Detection & Discovery. Within this page, choose which assets should be monitored for user data. If so, click Monitor; otherwise, click Ignore.

This decision can always be updated within the Monitored and Unmonitored tabs by selecting the available action (Ignore or Monitor).

Data Schema Updates

Monitors track changes in data schemas. Additions, such as a new column, appear with a green, upward arrow. Deletions, such as a dropped column, appear with a red, downward arrow. Other changes appear as a blue dot.

Detection states

Data Discovery

After choosing to Monitor a schema, staged resources are promoted to the Data Discovery tab. This is where users review the data category tags assigned by the Fides classifier, making necessary adjustments to each field that was promoted. Click through the table rows to see how classifications are made during the automated discovery phase.

To update data category tags, click the data category assigned automatically. Then, search for the correct data category; the UI updates the Status to Reviewed after you update a data category.

When all the categories are reviewed and properly assigned, click Confirm all to commit the schema and classifications to a Fides dataset, which are viewable in the Manage datasets tab.

Classification

From the data schema and table view, use the Reclassify button to update the data category. This is useful only when you re-run classification:

  • before discovery results have been committed to a dataset
  • after updating a monitor configuration
💡

Monitors can share configurations for regex annotation parameters. When a field matches a regex pattern, the corresponding data category is applied and the machine learning classification process is skipped for that field. This is especially useful when your data has similar naming schemas across assets.

For example, providing the regex mapping:

.*os_version -> user.device

classifies all field labels that contain os_version as user.device.

Datasets

After you confirm in the Discovery page, a new Dataset is created and is viewable in the Manage datasets tab.

To read more about datasets, see Datasets.

Updating Annotations Directly on a Dataset

When you update a field’s annotation directly in a dataset, it ignores any updates made in the monitor. In other words, changes you make directly on the dataset have the highest priority when assigning data category tags.