Value Normalization: How to Standardize Your Product Catalog Data

Pattern

Value normalization is the enrichment discipline that nobody talks about, and that everyone is quietly suffering from. It is the process of taking a catalog where “Navy Blue,” “navy,” “dark navy,” “midnight blue,” “cobalt,” and “ink blue” are all stored as separate color values for what should be one canonical color, and resolving them to a consistent, machine-queryable standard. It sounds tedious. It is. It is also the foundational prerequisite for faceted filtering to work correctly, for AI agent attribute matching to be reliable, and for your catalog to behave as a coherent data asset rather than an accumulated set of inconsistent records.

47
Average number of unique color attribute values in a mid-market apparel catalog that should have 12–15 canonical values.
3–5×
Increase in filter inclusivity rate after value normalization in a typical multi-supplier catalog.
0
Additional infrastructure required, normalization is a data quality process, not a technology purchase.

Why Normalization Fails Without a System

Unnormalized catalogs are almost never the result of carelessness. They are the result of a perfectly natural process: product data comes from multiple suppliers with different naming conventions, multiple team members who make reasonable but inconsistent choices, and historical conventions that made sense at the time but accumulated into fragmentation over multiple seasons and sourcing cycles.

Supplier A ships “Waterproof Jacket” with color “midnight blue.” Supplier B ships the same jacket in the same color as “cobalt.” Your website buyer who added a similar product last year used “Navy.” Your merchandiser who added the latest range used “Dark Blue.” All four values appear in your color filter, as four separate filter options. Shoppers filtering by “Navy” see one product. The shopper filtering by “Midnight Blue” sees a different product. Neither sees all four jackets in what is effectively the same color. Filter performance is fragmented, and each new product added with a supplier-provided color value makes the fragmentation worse.

Normalization Is Not Just About Tidiness

The commercial case for normalization is not that your data looks cleaner. It is that unnormalized categorical values fragment your filter facets into dozens of low-inventory options that each receive a fraction of the filter traffic that a single canonical value would receive. A “Navy Blue” filter facet with 120 products is a high-traffic, high-conversion filter option. Fourteen separate “blue” variant values with an average of 9 products each produce fourteen low-traffic facets. The traffic is the same. The conversion and the shopper experience are materially worse.

Why fragmentation happens

Suppliers, merchandisers, and historical conventions all create slightly different but “reasonable” value names.

What the shopper experiences

Too many thin filter facets, each showing too little inventory to feel useful or trustworthy.

The 5-Step Normalization Process

Normalization workflow

01

Extract

Pull every unique value for the attribute and sort by frequency.

02

Define canon

Create the accepted value list shoppers and channels should actually use.

03

Map

Build old value → canonical value mappings with human review for ambiguous cases.

04

Apply

Update the catalog and verify only canonical values remain.

05

Govern

Prevent new fragmentation with controlled input and supplier onboarding rules.

01

Extract all unique values per attribute

For each attribute you are normalizing, pull every unique value currently in your catalog database. Export to a spreadsheet. Sort by frequency, most common values first. The frequency distribution reveals the scope: if you have 47 unique color values and the top 10 account for 80% of products, you have a manageable normalization project. If you have 200 unique values with no dominant entries, you have a larger project.

02

Define your canonical value list

For each attribute, define the complete list of canonical values you will accept. This list should: (a) cover all legitimate product options in the attribute; (b) use the terminology your shoppers use, not supplier terminology; (c) be consistent with the canonical values expected by your channels (Google Shopping color taxonomy; Amazon browse node attribute values). The canonical list is your target state. Every attribute value in the catalog will map to one item on this list.

03

Build the normalization mapping

Create a mapping table: old value → canonical value. For the color attribute with 47 values, this means 47 rows, each mapping an old value to the canonical value it should become. This step requires human judgment. “midnight blue” → “Navy Blue” is a reasonable normalization, but “midnight” could plausibly be “Black” or “Navy Blue” depending on category context. Ambiguous cases should be reviewed against product images.

04

Apply the mapping to your catalog

Apply the normalization mapping to your product database. This can be done with a VLOOKUP/XLOOKUP in a spreadsheet for small catalogs, with a database UPDATE statement for larger catalogs, or with a normalization function in your PIM. After applying: pull unique values again and confirm that only canonical values remain in the attribute field.

05

Implement governance to prevent re-fragmentation

Normalization without governance is a one-time fix that degrades over time. Prevention: (a) validate attribute values at the point of product creation, reject non-canonical values at data entry; (b) use dropdown/selection fields rather than free-text fields for categorical attributes where normalization standards apply; (c) define a supplier onboarding process that includes a data normalization step before supplier data is ingested into your catalog.

The Attributes That Need Normalization Most Urgently

Attribute Normalization Complexity Why It Matters Most Canonical Standard to Use
Color High, dozens of supplier naming conventions, multiple legitimate interpretations for some hues Primary filter attribute in apparel, home, and many other categories; fragmentation directly reduces filter performance Google Shopping color taxonomy for Shopping feeds; your own brand-consistent canonical list for website filters
Material Medium, composition percentage adds complexity (“100% cotton” vs “cotton” vs “pure cotton”) AI agent and shopper filter queries increasingly specify material; agentic queries for sustainability require precise material data Full composition string: percentage + material name (e.g., “58% Cotton, 42% Polyester”); avoid shorthand
Size Medium, multiple size systems (UK/EU/US), clothing vs. shoes vs. accessories have different conventions Wrong size data is the leading cause of “wrong size” returns; inconsistent size systems create filter confusion for international shoppers Category-specific: clothing (UK numeric), shoes (UK half-size with EU equivalent), children’s (age range + cm measurement)
Fit / Cut Low complexity, high impact, most brands use their own fit terminology Fit is a primary filter attribute in apparel; brand-specific terms fragment filter if not normalized Standard: Slim Fit, Regular Fit, Relaxed Fit, Oversized, map brand-specific terminology to these canonical terms
Certification Low complexity, certifications have official names AI agents increasingly query for sustainability and safety certifications; non-standard certification names reduce matching probability Use official certification names exactly: “OEKO-TEX Standard 100,” “Bluesign Approved,” “Fair Trade Certified,” not shorthand

Urgency usually tracks shopper specificity

The more often shoppers filter, compare, or query on an attribute, the more expensive fragmentation becomes. Color, size, material, and fit are therefore rarely “nice to clean up later” fields. They are usually front-line revenue fields.

Normalization at Scale: When Manual Mapping Isn’t Enough

The normalization mapping approach described above works well for catalogs with a manageable number of unique values per attribute and a team with time to review each mapping. For catalogs with tens of thousands of SKUs from dozens of suppliers, manual normalization mapping becomes a project in itself, and one that needs to be re-run every time a new supplier’s data is ingested.

AI-assisted normalization addresses this scale challenge by learning your canonical value list and automatically mapping new attribute values to the closest canonical match, with confidence scores that route ambiguous cases to human review. This preserves the accuracy of human judgment for edge cases while automating the 80–90% of clear-cut normalization mappings that follow obvious patterns.

Unnormalized Catalog (Filter View) Normalized Catalog (Filter View)
Color filter: Navy (12) | Dark Navy (8) | Midnight (6) | Cobalt Blue (4) | Ocean (3) | Ink Blue (3) | Navy Blue (11) | Dark Blue (9), 8 facets, avg 7 products each Color filter: Navy Blue (56) | Sky Blue (23) | Royal Blue (18), 3 facets, meaningful inventory per option, higher filter CTR
Size filter: S (34) | Small (12) | SM (8) | Medium (45) | M (23) | Medium-Large (3) | Large (38) | L (19) | Lge (4), fragmented, confusing Size filter: XS (14) | S (54) | M (68) | L (57) | XL (34) | 2XL (18), clean, intuitive, full-inventory facets
Use frequency-sorted exports to spot where a small number of mappings will solve most of the problem.
Route ambiguous terms like “midnight” or “natural” to human review rather than forcing automation blindly.
Push approved mappings back into supplier onboarding so the same fragmentation does not re-enter next season.
Re-run unique-value checks after every large ingestion cycle to catch drift before filters fragment again.

Velou on Automated Normalization

Value normalization is a task that sits at the intersection of data management and domain knowledge, you need to know what “midnight” means in context, what size system a product category uses, and what the canonical certification name for “certified organic” is. Commerce-1 is trained on retail product data specifically, which means it understands these category conventions and can apply normalization mappings with the contextual accuracy that general text processing tools lack.

Normalize your catalog data automatically, at scale

Commerce-1 maps attribute values to canonical standards across your full catalog.

Request a demo

See how AI-ready your catalog really is.