Classifier Parameters

Fides classification leverages several named-entity recognition Natural Language Processing (NLP) tools, leveraging Python libraries including spacy and presidio. Enterprise customers interested in AI-driven classification should inquire about the Fides classifier extension using privateAI.

Classification is broken out into Content and Context analysis:

Context parameters control how Fides analyzes the metadata (i.e. field/column name)
Content parameters adjust analysis of the underlying instance data (e.g. a 5 row sample)

Parameter Descriptions

The parameters below can be added to a discovery monitor by calling the following endpoint:

PUT /api/v1/plus/discovery-monitor

All of these parameters should be specified within the classify_params configuration object.

excluded_categories: a list of data category keys that should never be returned by the classifier.

["user.content.public", "user.device.cookie_id"]

context_regex_pattern_mapping: a list of 2-element lists indicating a regex pattern for column-matching and a Fideslang data category key

[
    [".*dob", "user.demographic.date_of_birth"], 
    [".*ssn", "user.government_id.national_identification_number"],
    [".*User\.Name", "user.account.username"]
]

💡

Note: Regular expressions are evaluated using a fully qualified, dot-delimited field identifier. In other words, the schema and table name are included when evaluating the regular expression.

Example: Within a database, the schema Public contains table User with column Name. So, during a Fides scan, registered regex patterns will run against Public.User.Name. In the example pattern above, the regex pattern *.*User\.Name* will match the column Name in the User table in the Public schema, and its assigned data category will be user.account.username.

pii_threshold (default: 0.4): This field controls how Fides will filter out non-PII results from the classifier's context classification; lower values increase the chance of false positives.
context_weight (default: 0.55): Fides uses this variable with content_weight to determine what Data categorization to apply to a given field. A larger numbers means that Fides will rely more heavily on the context.
content_weight (default: 0.45): Fides uses this variable with context_weight to determine what Data categorization to apply to a given field. A larger numbers means that Fides will rely more heavily on the content.
content_classification_enabled (default: true): If set to false, the content classification phase will be skipped.
context_classification_enabled (default: true): If set to false, the context classification phase will be skipped.

Shared Configurations

Fides 2.63.0 introduced the ability to provide and share regex classification parameters through the UI.

Under Settings, within the Integrations page, click "Shared configs". This will launch a modal where you can create regex patterns that can be attached to a monitor configuration.

API Support Tutorials