Skip to content
Fides Configuration
Data Discovery
Confidence Scores

Confidence Scores

When the Fides LLM classifier tags a field with a data category, it also returns a confidence rating — a 1–5 integer reflecting how certain it is about the predicted classification. These ratings are grouped into confidence buckets (High, Medium, Low, or Manual) that surface in the Action Center and Schema Explorer to help you prioritize your review.

How confidence ratings are assigned

The LLM evaluates each field using the full schema context — including table name, column names, and data types — and assigns a rating based on the following rubric:

RatingMeaning
5Definitive classification. Only trivial uncertainty about the correct data category.
4Small uncertainty. Most privacy experts would agree with the classification.
3Multiple reasonable classifications apply; difficult to choose between them.
2Significant uncertainty due to schema ambiguity or overlapping taxonomy categories.
1Extremely speculative.

Regex-based classifications always receive a rating of 5, as they are deterministic matches. When the LLM cannot confidently classify a field, it defaults to system.operations with no confidence rating, which places the field in the Low bucket.

Confidence buckets

Ratings are grouped into four buckets for display and filtering:

BucketConditionDescription
HighRating > 4The classifier is highly certain. These fields are strong candidates for bulk approval.
MediumRating = 4Reasonable certainty, but worth a quick review before approving.
LowRating < 4 (or no rating)The classifier is uncertain. These fields require careful manual review.
ManualUser-assignedA data category was manually set by a user, regardless of the classifier's rating.

What influences confidence

Several factors affect the rating the LLM assigns to a given field:

  • Schema context: The LLM receives the full table and column structure, not just the individual field name. This additional context improves accuracy and confidence on ambiguous fields — for example, a column named id in a table named payments is more clearly identifiable than one in a generic records table.

  • Taxonomy specificity: Fields that map unambiguously to a well-defined data category tend to receive higher ratings. Categories with overlapping definitions or broad descriptions produce more uncertainty.

  • Custom taxonomy instructions: Providing additional tagging guidance via classify_params — including per-category tagging_instructions, overrides, or removals — can sharpen what the LLM considers a strong match and improve confidence on domain-specific schemas.

Using confidence scores in your workflow

Confidence buckets are surfaced in two places in the Fides UI:

  • Action Center: Summary cards display the number of fields at each confidence level per monitor, giving you an at-a-glance view of where review effort is needed.
  • Schema Explorer: Filter fields by confidence bucket to focus on specific tiers — for example, filtering to High to bulk-approve reliable classifications, or Low to identify fields that need the most attention.

For a step-by-step workflow using confidence scores to accelerate classification review, see the Helios tutorial.