Structured Product Data for AI: The Technical Spec for the Agentic Era

Pattern

Most discussions of product data and AI stay at the strategic level: “make your data more structured,” “add more attributes,” “be precise.” This article goes further. It explains the technical specifications that determine how AI systems read, parse, and evaluate product data, at the level of data types, schema fields, and format requirements. If you are responsible for catalog data quality and you want to understand precisely what “AI-ready” means in implementable terms, this is the article for you.

Why Data Structure Is the Foundational AI Requirement

AI systems that evaluate product data, whether shopping agents, recommendation engines, or search AI, operate on a fundamental distinction between two types of data: data they can query deterministically, and data they must interpret probabilistically.

Structured data (typed attribute fields with explicit values) can be queried deterministically. When an AI agent executes weight < 500, it gets a precise true/false result for any product with a numeric weight attribute. The query is reliable, scalable, and consistent across millions of products.

Unstructured data (free-form prose text) requires probabilistic interpretation. When “lightweight” appears in a description, an AI system must estimate what weight that probably implies, and that estimate carries uncertainty. It may be right 80% of the time. But in a binary filter query, 80% accuracy means 20% of your products either incorrectly included or incorrectly excluded. At scale, that uncertainty is commercially significant.

Structured data answers a query. Unstructured data requires interpretation. AI systems prefer certainty.

The Technical Specification: What AI-Ready Product Data Looks Like

1. Typed Attribute Fields

Every attribute that can appear in a purchase decision criterion should be stored as a typed field, not embedded in prose. Typed fields have defined data types that determine how they can be queried:

Data Type Examples Why It Matters for AI Common Failure
Boolean waterproof: true, packable: true, vegan: false Enables exact-match filtering: WHERE waterproof = TRUE. Cannot be approximated from text. “Highly water resistant” — agent cannot map to boolean true/false with certainty.
Numeric (integer/float) weight: 490, price: 89.99, protein_g: 25, waterproof_rating: 20000 Enables range filtering: WHERE weight < 500. Requires explicit unit declaration. “Very lightweight” or “approx 500g” — not queryable as a numeric comparison.
Categorical (enum) color: “Navy Blue”, fit: “Regular”, gender: “Unisex” Enables exact-match and set-membership queries. Requires canonical value normalization. “Ocean blue” or “dark navy” — non-canonical values fragment filter matching.
Array/List compatible_with: [“iPhone 14”, “iPhone 15”, “iPhone 16”], use_cases: [“hiking”, “travel”, “commuting”] Enables membership queries: WHERE “hiking” IN use_cases. Critical for compatibility and multi-use products. “Compatible with most smartphones” — not queryable for specific device matching.
Text (structured) certification: “EN 343:2019 Class 3”, standard: “ISO 811” References to standards and certifications need the full designation to be matched against specification queries. “Certified” or “meets industry standards” — no matchable specification reference.
DateTime / Duration warranty_months: 24, dispatch_days: 1, delivery_days_max: 3 Enables temporal filtering for delivery and warranty requirements. “Fast dispatch” or “2-year warranty” in text — not queryable as a number.

2. Unit Declaration

Numeric attributes without explicit unit declarations create interpretation problems that AI systems resolve inconsistently. A weight value of “490” is ambiguous, 490g? 490kg? 490 lbs? In practice, context usually makes the intended unit obvious to a human. To an AI system executing a WHERE weight < 500 query, unit ambiguity produces inconsistent results: the system may assume grams for some products and kilograms for others, or may reject the comparison entirely.

The technical requirement: every numeric attribute should be stored as a value + unit pair, or in a field with an explicit unit declaration. The format depends on the data system:

  • In a structured database or PIM: separate fields for value (490) and unit (g), or a combined field with consistent formatting (“490g”).
  • In schema.org markup: use unitText or unitCode in QuantitativeValue nodes — e.g., {"@type": "QuantitativeValue", "value": 490, "unitCode": "GRM"}.
  • In Google product_details: include the unit in the attribute value string — “Waterproof Rating / 20,000mm HH” — so the unit is explicit in the structured pair.

3. Canonical Value Normalization

Categorical attributes must use canonical values, a standardized set of accepted terms, to enable reliable filtering and entity matching. The canonical standard varies by attribute type:

Attribute Type Canonical Standard AI System That Uses It Normalization Action Required
Product color Google’s canonical color taxonomy for Shopping feeds, or your own defined canonical list Google Shopping algorithm; Google AI Overviews color filtering Map all variant color expressions (navy, midnight, cobalt, dark blue) to one canonical value (Navy Blue).
Product category Google Product Taxonomy (numeric IDs); Amazon browse node IDs Google Shopping Graph; Amazon A10 algorithm; AI Overview query matching Map to the most specific applicable numeric taxonomy node; verify quarterly.
Gender Standardized values: Men’s, Women’s, Unisex, Boys’, Girls’ — Google’s gender taxonomy Google Shopping gender filtering; AI query matching for gender-specific searches Normalize all variant expressions (Male, Mens, M, For Men) to canonical.
Size Category-specific size standards (UK/EU sizing systems, numeric for dimensions) Filtered search; agent size-specific queries Populate size as a typed field with the canonical size scale for the category; avoid brand-specific size terminology.
Material Percentage composition: “100% Recycled Polyester” not “recycled material” Sustainability-query matching; material-filtering agents Full material composition by percentage; no generic material claims.

Schema.org Implementation — The Complete Technical Spec

Schema.org Product markup is the universal machine-readable layer for product data. AI systems, particularly browser-embedded agents and Google’s crawlers, read this markup independently of your product database and Merchant Center feed. Here is the complete technical implementation for agentic readiness:

Schema Property Technical Format AI System Use Required?
name Plain text string matching H1 and feed title Primary product identification for all crawling agents Required
brand {"@type": "Brand", "name": "[Brand Name]"} Brand entity matching in Knowledge Graph; brand-query eligibility Required
gtin13 or gtin8 String of 13 or 8 digits — exact GS1-registered GTIN Entity matching trigger for Shopping Graph and Knowledge Graph Required for branded products
offers {"@type": "Offer", "price": "89.99", "priceCurrency": "GBP", "availability": "https://schema.org/InStock", "url": "[PDP URL]"} Price and availability for agent purchase evaluation; must match feed exactly Required
offers.shippingDetails {"@type": "OfferShippingDetails", "shippingRate": {...}, "deliveryTime": {"@type": "ShippingDeliveryTime", "handlingTime": {...}, "transitTime": {"minValue": 1, "maxValue": 3, "unitCode": "DAY"}}} Delivery time for agent shipping-speed evaluation High priority
aggregateRating {"@type": "AggregateRating", "ratingValue": "4.3", "reviewCount": "847"} Review data for agent trust ranking; enables star ratings in search results High priority
additionalProperty {"@type": "PropertyValue", "name": "Waterproof Rating", "value": "20,000mm HH"} Most powerful field for agentic discoverability — every attribute becomes machine-queryable High priority for agentic readiness
hasMerchantReturnPolicy {"@type": "MerchantReturnPolicy", "returnPolicyCountry": "GB", "returnWithin": {"@type": "QuantitativeValue", "value": 60, "unitCode": "DAY"}} Return policy for agent purchase risk evaluation Medium priority

5. The additionalProperty Field: The Most Powerful Schema Field for AI

additionalProperty deserves its own section because it is both the most powerful and the most underused schema field for agentic product discoverability. It accepts an array of PropertyValue objects, one per product attribute, turning every structured product attribute into a machine-readable, query-matchable declaration on your product page.

Full implementation example for a hiking jacket:

"additionalProperty": [ {"@type": "PropertyValue", "name": "Weight", "value": "490g"}, {"@type": "PropertyValue", "name": "Waterproof Rating", "value": "20,000mm HH"}, {"@type": "PropertyValue", "name": "Packable", "value": "Yes"}, {"@type": "PropertyValue", "name": "Material", "value": "100% Recycled Polyester"}, {"@type": "PropertyValue", "name": "Sustainability", "value": "Bluesign Certified, OEKO-TEX Standard 100"}, {"@type": "PropertyValue", "name": "Gender", "value": "Unisex"} ]

Each PropertyValue pair is independently readable by any AI system that crawls the page. An agent looking for packable products can read {"name": "Packable", "value": "Yes"} with certainty. It does not need to infer packability from description prose. This is the technical implementation of structured product data that agentic systems require.

Real-Time Data Architecture for Agentic Accuracy

Structural completeness is necessary but not sufficient. AI agents making purchase decisions need accurate, current data, specifically price, availability, and any time-sensitive attributes. The technical requirements for agentic data freshness:

Data Type Freshness Requirement Technical Implementation
Price Feed updated within 2 hours of any price change; schema.org price generated dynamically from live price field Content API for real-time feed updates; server-side schema generation from database (not static template)
Availability Feed and schema updated within 1 hour of stock depletion; preorder status reflected immediately Inventory system directly triggers feed and schema update events; no batch delay for availability changes
Promotional pricing sale_price and sale_price_effective_date populated in feed before promotion starts; removed immediately after Promotion management system integrated with feed generation; time-based automation for sale_price management
Shipping time shippingDetails schema reflects actual current dispatch capability, not aspirational lead times Shipping time data sourced from live warehouse operations; dynamically updated when capacity or carrier changes

The technical standard has three layers

01

Typed attributes

Correct data type, canonical value, and explicit units where relevant.

02

Machine-readable markup

product_details in feeds and additionalProperty in schema make attributes queryable.

03

Live sync accuracy

Price, availability, and delivery signals must stay aligned everywhere.

The AI-Ready Product Data Checklist — Technical Edition

All purchase-criteria attributes as typed fields — Boolean, numeric, categorical, or array fields, not embedded in description text.
All numeric attributes with explicit unit declarations — weight: “490g” not “490”; waterproof_rating: “20000mm HH” not “20000”.
Canonical value normalization complete — All color, material, fit, gender, and size values mapped to canonical terms; zero variant expressions per attribute.
product_details feed field populated — All purchase-criteria attributes declared as name/value pairs in the Google Shopping product_details field.
Schema.org Product markup complete — name, brand, gtin, offers (price + availability + shippingDetails), aggregateRating, all present and valid.
additionalProperty implemented — All purchase-criteria attributes declared as PropertyValue pairs in schema, one per attribute, with explicit name and value.
Schema dynamically generated — Schema markup generated server-side from live product database, not static template that can drift from live data.
Schema-feed-PDP consistency — Price, availability, and product identifiers identical across schema, Merchant Center feed, and visible PDP content.
Real-time price and availability sync — Feed and schema updated within 2 hours of price change; within 1 hour of availability change.
GTIN GS1-validated — All submitted GTINs pass GS1 check-digit validation; no self-assigned or malformed identifiers.
Taxonomy at leaf node — Google product category and Amazon browse node mapped to the most specific applicable child node.
Boolean attributes typed — packable, waterproof, vegan, organic, cruelty_free declared as true/false, not “yes/no” text strings.

Typed data first

If a purchase criterion lives only in prose, the system has to guess instead of query.

Markup makes it portable

Feeds help platforms. Schema helps crawlers and browser agents read the same truth.

Freshness protects trust

Accurate, current price and stock signals are part of the technical standard now.

Velou’s Technical Implementation Approach

Commerce-1 generates AI-ready product data at every layer described in this article simultaneously: typed attribute fields from source data, unit-explicit numeric values, canonical value normalization, product_details pairs for Google feeds, and schema.org additionalProperty arrays for PDPs.

The output is not just richer product content. It is technically structured product data that meets the format requirements that AI systems use for deterministic evaluation. For retailers who want to implement the technical spec described here across a catalog of thousands of SKUs, Commerce-1 is the operational mechanism that makes it achievable.

Implement the AI-ready technical standard across your catalog

Commerce-1 generates typed attributes, canonical values, product_details, and schema simultaneously.

Request a demo

See how AI-ready your catalog really is.