Value Normalization: How to Standardize Your Product Catalog Data
Value normalization is the enrichment discipline that nobody talks about, and that everyone is quietly suffering from. It is the process of taking a catalog where “Navy Blue,” “navy,” “dark navy,” “midnight blue,” “cobalt,” and “ink blue” are all stored as separate color values for what should be one canonical color, and resolving them to a consistent, machine-queryable standard. It sounds tedious. It is. It is also the foundational prerequisite for faceted filtering to work correctly, for AI agent attribute matching to be reliable, and for your catalog to behave as a coherent data asset rather than an accumulated set of inconsistent records.
Why Normalization Fails Without a System
Unnormalized catalogs are almost never the result of carelessness. They are the result of a perfectly natural process: product data comes from multiple suppliers with different naming conventions, multiple team members who make reasonable but inconsistent choices, and historical conventions that made sense at the time but accumulated into fragmentation over multiple seasons and sourcing cycles.
Supplier A ships “Waterproof Jacket” with color “midnight blue.” Supplier B ships the same jacket in the same color as “cobalt.” Your website buyer who added a similar product last year used “Navy.” Your merchandiser who added the latest range used “Dark Blue.” All four values appear in your color filter, as four separate filter options. Shoppers filtering by “Navy” see one product. The shopper filtering by “Midnight Blue” sees a different product. Neither sees all four jackets in what is effectively the same color. Filter performance is fragmented, and each new product added with a supplier-provided color value makes the fragmentation worse.
Normalization Is Not Just About Tidiness
The commercial case for normalization is not that your data looks cleaner. It is that unnormalized categorical values fragment your filter facets into dozens of low-inventory options that each receive a fraction of the filter traffic that a single canonical value would receive. A “Navy Blue” filter facet with 120 products is a high-traffic, high-conversion filter option. Fourteen separate “blue” variant values with an average of 9 products each produce fourteen low-traffic facets. The traffic is the same. The conversion and the shopper experience are materially worse.
Why fragmentation happens
Suppliers, merchandisers, and historical conventions all create slightly different but “reasonable” value names.
What the shopper experiences
Too many thin filter facets, each showing too little inventory to feel useful or trustworthy.
The 5-Step Normalization Process
Normalization workflow
Extract
Pull every unique value for the attribute and sort by frequency.
Define canon
Create the accepted value list shoppers and channels should actually use.
Map
Build old value → canonical value mappings with human review for ambiguous cases.
Apply
Update the catalog and verify only canonical values remain.
Govern
Prevent new fragmentation with controlled input and supplier onboarding rules.
Extract all unique values per attribute
For each attribute you are normalizing, pull every unique value currently in your catalog database. Export to a spreadsheet. Sort by frequency, most common values first. The frequency distribution reveals the scope: if you have 47 unique color values and the top 10 account for 80% of products, you have a manageable normalization project. If you have 200 unique values with no dominant entries, you have a larger project.
Define your canonical value list
For each attribute, define the complete list of canonical values you will accept. This list should: (a) cover all legitimate product options in the attribute; (b) use the terminology your shoppers use, not supplier terminology; (c) be consistent with the canonical values expected by your channels (Google Shopping color taxonomy; Amazon browse node attribute values). The canonical list is your target state. Every attribute value in the catalog will map to one item on this list.
Build the normalization mapping
Create a mapping table: old value → canonical value. For the color attribute with 47 values, this means 47 rows, each mapping an old value to the canonical value it should become. This step requires human judgment. “midnight blue” → “Navy Blue” is a reasonable normalization, but “midnight” could plausibly be “Black” or “Navy Blue” depending on category context. Ambiguous cases should be reviewed against product images.
Apply the mapping to your catalog
Apply the normalization mapping to your product database. This can be done with a VLOOKUP/XLOOKUP in a spreadsheet for small catalogs, with a database UPDATE statement for larger catalogs, or with a normalization function in your PIM. After applying: pull unique values again and confirm that only canonical values remain in the attribute field.
Implement governance to prevent re-fragmentation
Normalization without governance is a one-time fix that degrades over time. Prevention: (a) validate attribute values at the point of product creation, reject non-canonical values at data entry; (b) use dropdown/selection fields rather than free-text fields for categorical attributes where normalization standards apply; (c) define a supplier onboarding process that includes a data normalization step before supplier data is ingested into your catalog.
The Attributes That Need Normalization Most Urgently
| Attribute | Normalization Complexity | Why It Matters Most | Canonical Standard to Use |
|---|---|---|---|
| Color | High, dozens of supplier naming conventions, multiple legitimate interpretations for some hues | Primary filter attribute in apparel, home, and many other categories; fragmentation directly reduces filter performance | Google Shopping color taxonomy for Shopping feeds; your own brand-consistent canonical list for website filters |
| Material | Medium, composition percentage adds complexity (“100% cotton” vs “cotton” vs “pure cotton”) | AI agent and shopper filter queries increasingly specify material; agentic queries for sustainability require precise material data | Full composition string: percentage + material name (e.g., “58% Cotton, 42% Polyester”); avoid shorthand |
| Size | Medium, multiple size systems (UK/EU/US), clothing vs. shoes vs. accessories have different conventions | Wrong size data is the leading cause of “wrong size” returns; inconsistent size systems create filter confusion for international shoppers | Category-specific: clothing (UK numeric), shoes (UK half-size with EU equivalent), children’s (age range + cm measurement) |
| Fit / Cut | Low complexity, high impact, most brands use their own fit terminology | Fit is a primary filter attribute in apparel; brand-specific terms fragment filter if not normalized | Standard: Slim Fit, Regular Fit, Relaxed Fit, Oversized, map brand-specific terminology to these canonical terms |
| Certification | Low complexity, certifications have official names | AI agents increasingly query for sustainability and safety certifications; non-standard certification names reduce matching probability | Use official certification names exactly: “OEKO-TEX Standard 100,” “Bluesign Approved,” “Fair Trade Certified,” not shorthand |
Urgency usually tracks shopper specificity
The more often shoppers filter, compare, or query on an attribute, the more expensive fragmentation becomes. Color, size, material, and fit are therefore rarely “nice to clean up later” fields. They are usually front-line revenue fields.
Normalization at Scale: When Manual Mapping Isn’t Enough
The normalization mapping approach described above works well for catalogs with a manageable number of unique values per attribute and a team with time to review each mapping. For catalogs with tens of thousands of SKUs from dozens of suppliers, manual normalization mapping becomes a project in itself, and one that needs to be re-run every time a new supplier’s data is ingested.
AI-assisted normalization addresses this scale challenge by learning your canonical value list and automatically mapping new attribute values to the closest canonical match, with confidence scores that route ambiguous cases to human review. This preserves the accuracy of human judgment for edge cases while automating the 80–90% of clear-cut normalization mappings that follow obvious patterns.
| Unnormalized Catalog (Filter View) | Normalized Catalog (Filter View) |
|---|---|
| Color filter: Navy (12) | Dark Navy (8) | Midnight (6) | Cobalt Blue (4) | Ocean (3) | Ink Blue (3) | Navy Blue (11) | Dark Blue (9), 8 facets, avg 7 products each | Color filter: Navy Blue (56) | Sky Blue (23) | Royal Blue (18), 3 facets, meaningful inventory per option, higher filter CTR |
| Size filter: S (34) | Small (12) | SM (8) | Medium (45) | M (23) | Medium-Large (3) | Large (38) | L (19) | Lge (4), fragmented, confusing | Size filter: XS (14) | S (54) | M (68) | L (57) | XL (34) | 2XL (18), clean, intuitive, full-inventory facets |
Velou on Automated Normalization
Value normalization is a task that sits at the intersection of data management and domain knowledge, you need to know what “midnight” means in context, what size system a product category uses, and what the canonical certification name for “certified organic” is. Commerce-1 is trained on retail product data specifically, which means it understands these category conventions and can apply normalization mappings with the contextual accuracy that general text processing tools lack.
Normalize your catalog data automatically, at scale
Commerce-1 maps attribute values to canonical standards across your full catalog.
Request a demo

.png)
.png)