Confidence Scores
When the Fides LLM classifier tags a field with a data category, it also returns a confidence rating — a 1–5 integer reflecting how certain it is about the predicted classification. These ratings are grouped into confidence buckets (High, Medium, Low, or Manual) that surface in the Action Center and Schema Explorer to help you prioritize your review.
How confidence ratings are assigned
The LLM evaluates each field using the full schema context — including table name, column names, and data types — and assigns a rating based on the following rubric:
| Rating | Meaning |
|---|---|
| 5 | Definitive classification. Only trivial uncertainty about the correct data category. |
| 4 | Small uncertainty. Most privacy experts would agree with the classification. |
| 3 | Multiple reasonable classifications apply; difficult to choose between them. |
| 2 | Significant uncertainty due to schema ambiguity or overlapping taxonomy categories. |
| 1 | Extremely speculative. |
Regex-based classifications always receive a rating of 5, as they are deterministic matches. When the LLM cannot confidently classify a field, it defaults to system.operations with no confidence rating, which places the field in the Low bucket.
Confidence buckets
Ratings are grouped into four buckets for display and filtering:
| Bucket | Condition | Description |
|---|---|---|
| High | Rating > 4 | The classifier is highly certain. These fields are strong candidates for bulk approval. |
| Medium | Rating = 4 | Reasonable certainty, but worth a quick review before approving. |
| Low | Rating < 4 (or no rating) | The classifier is uncertain. These fields require careful manual review. |
| Manual | User-assigned | A data category was manually set by a user, regardless of the classifier's rating. |
What influences confidence
Several factors affect the rating the LLM assigns to a given field:
-
Schema context: The LLM receives the full table and column structure, not just the individual field name. This additional context improves accuracy and confidence on ambiguous fields — for example, a column named
idin a table namedpaymentsis more clearly identifiable than one in a genericrecordstable. -
Taxonomy specificity: Fields that map unambiguously to a well-defined data category tend to receive higher ratings. Categories with overlapping definitions or broad descriptions produce more uncertainty.
-
Custom taxonomy instructions: Providing additional tagging guidance via
classify_params— including per-categorytagging_instructions, overrides, or removals — can sharpen what the LLM considers a strong match and improve confidence on domain-specific schemas.
Using confidence scores in your workflow
Confidence buckets are surfaced in two places in the Fides UI:
- Action Center: Summary cards display the number of fields at each confidence level per monitor, giving you an at-a-glance view of where review effort is needed.
- Schema Explorer: Filter fields by confidence bucket to focus on specific tiers — for example, filtering to High to bulk-approve reliable classifications, or Low to identify fields that need the most attention.
For a step-by-step workflow using confidence scores to accelerate classification review, see the Helios tutorial.