Structured Product Data for AI: The Technical Spec for the Agentic Era
Most discussions of product data and AI stay at the strategic level: “make your data more structured,” “add more attributes,” “be precise.” This article goes further. It explains the technical specifications that determine how AI systems read, parse, and evaluate product data, at the level of data types, schema fields, and format requirements. If you are responsible for catalog data quality and you want to understand precisely what “AI-ready” means in implementable terms, this is the article for you.
Why Data Structure Is the Foundational AI Requirement
AI systems that evaluate product data, whether shopping agents, recommendation engines, or search AI, operate on a fundamental distinction between two types of data: data they can query deterministically, and data they must interpret probabilistically.
Structured data (typed attribute fields with explicit values) can be queried deterministically. When an AI agent executes weight < 500, it gets a precise true/false result for any product with a numeric weight attribute. The query is reliable, scalable, and consistent across millions of products.
Unstructured data (free-form prose text) requires probabilistic interpretation. When “lightweight” appears in a description, an AI system must estimate what weight that probably implies, and that estimate carries uncertainty. It may be right 80% of the time. But in a binary filter query, 80% accuracy means 20% of your products either incorrectly included or incorrectly excluded. At scale, that uncertainty is commercially significant.
The Technical Specification: What AI-Ready Product Data Looks Like
1. Typed Attribute Fields
Every attribute that can appear in a purchase decision criterion should be stored as a typed field, not embedded in prose. Typed fields have defined data types that determine how they can be queried:
| Data Type | Examples | Why It Matters for AI | Common Failure |
|---|---|---|---|
| Boolean | waterproof: true, packable: true, vegan: false | Enables exact-match filtering: WHERE waterproof = TRUE. Cannot be approximated from text. | “Highly water resistant” — agent cannot map to boolean true/false with certainty. |
| Numeric (integer/float) | weight: 490, price: 89.99, protein_g: 25, waterproof_rating: 20000 | Enables range filtering: WHERE weight < 500. Requires explicit unit declaration. | “Very lightweight” or “approx 500g” — not queryable as a numeric comparison. |
| Categorical (enum) | color: “Navy Blue”, fit: “Regular”, gender: “Unisex” | Enables exact-match and set-membership queries. Requires canonical value normalization. | “Ocean blue” or “dark navy” — non-canonical values fragment filter matching. |
| Array/List | compatible_with: [“iPhone 14”, “iPhone 15”, “iPhone 16”], use_cases: [“hiking”, “travel”, “commuting”] | Enables membership queries: WHERE “hiking” IN use_cases. Critical for compatibility and multi-use products. | “Compatible with most smartphones” — not queryable for specific device matching. |
| Text (structured) | certification: “EN 343:2019 Class 3”, standard: “ISO 811” | References to standards and certifications need the full designation to be matched against specification queries. | “Certified” or “meets industry standards” — no matchable specification reference. |
| DateTime / Duration | warranty_months: 24, dispatch_days: 1, delivery_days_max: 3 | Enables temporal filtering for delivery and warranty requirements. | “Fast dispatch” or “2-year warranty” in text — not queryable as a number. |
2. Unit Declaration
Numeric attributes without explicit unit declarations create interpretation problems that AI systems resolve inconsistently. A weight value of “490” is ambiguous, 490g? 490kg? 490 lbs? In practice, context usually makes the intended unit obvious to a human. To an AI system executing a WHERE weight < 500 query, unit ambiguity produces inconsistent results: the system may assume grams for some products and kilograms for others, or may reject the comparison entirely.
The technical requirement: every numeric attribute should be stored as a value + unit pair, or in a field with an explicit unit declaration. The format depends on the data system:
- In a structured database or PIM: separate fields for value (490) and unit (g), or a combined field with consistent formatting (“490g”).
- In schema.org markup: use unitText or unitCode in QuantitativeValue nodes — e.g., {"@type": "QuantitativeValue", "value": 490, "unitCode": "GRM"}.
- In Google product_details: include the unit in the attribute value string — “Waterproof Rating / 20,000mm HH” — so the unit is explicit in the structured pair.
3. Canonical Value Normalization
Categorical attributes must use canonical values, a standardized set of accepted terms, to enable reliable filtering and entity matching. The canonical standard varies by attribute type:
| Attribute Type | Canonical Standard | AI System That Uses It | Normalization Action Required |
|---|---|---|---|
| Product color | Google’s canonical color taxonomy for Shopping feeds, or your own defined canonical list | Google Shopping algorithm; Google AI Overviews color filtering | Map all variant color expressions (navy, midnight, cobalt, dark blue) to one canonical value (Navy Blue). |
| Product category | Google Product Taxonomy (numeric IDs); Amazon browse node IDs | Google Shopping Graph; Amazon A10 algorithm; AI Overview query matching | Map to the most specific applicable numeric taxonomy node; verify quarterly. |
| Gender | Standardized values: Men’s, Women’s, Unisex, Boys’, Girls’ — Google’s gender taxonomy | Google Shopping gender filtering; AI query matching for gender-specific searches | Normalize all variant expressions (Male, Mens, M, For Men) to canonical. |
| Size | Category-specific size standards (UK/EU sizing systems, numeric for dimensions) | Filtered search; agent size-specific queries | Populate size as a typed field with the canonical size scale for the category; avoid brand-specific size terminology. |
| Material | Percentage composition: “100% Recycled Polyester” not “recycled material” | Sustainability-query matching; material-filtering agents | Full material composition by percentage; no generic material claims. |
Schema.org Implementation — The Complete Technical Spec
Schema.org Product markup is the universal machine-readable layer for product data. AI systems, particularly browser-embedded agents and Google’s crawlers, read this markup independently of your product database and Merchant Center feed. Here is the complete technical implementation for agentic readiness:
| Schema Property | Technical Format | AI System Use | Required? |
|---|---|---|---|
| name | Plain text string matching H1 and feed title | Primary product identification for all crawling agents | Required |
| brand | {"@type": "Brand", "name": "[Brand Name]"} | Brand entity matching in Knowledge Graph; brand-query eligibility | Required |
| gtin13 or gtin8 | String of 13 or 8 digits — exact GS1-registered GTIN | Entity matching trigger for Shopping Graph and Knowledge Graph | Required for branded products |
| offers | {"@type": "Offer", "price": "89.99", "priceCurrency": "GBP", "availability": "https://schema.org/InStock", "url": "[PDP URL]"} | Price and availability for agent purchase evaluation; must match feed exactly | Required |
| offers.shippingDetails | {"@type": "OfferShippingDetails", "shippingRate": {...}, "deliveryTime": {"@type": "ShippingDeliveryTime", "handlingTime": {...}, "transitTime": {"minValue": 1, "maxValue": 3, "unitCode": "DAY"}}} | Delivery time for agent shipping-speed evaluation | High priority |
| aggregateRating | {"@type": "AggregateRating", "ratingValue": "4.3", "reviewCount": "847"} | Review data for agent trust ranking; enables star ratings in search results | High priority |
| additionalProperty | {"@type": "PropertyValue", "name": "Waterproof Rating", "value": "20,000mm HH"} | Most powerful field for agentic discoverability — every attribute becomes machine-queryable | High priority for agentic readiness |
| hasMerchantReturnPolicy | {"@type": "MerchantReturnPolicy", "returnPolicyCountry": "GB", "returnWithin": {"@type": "QuantitativeValue", "value": 60, "unitCode": "DAY"}} | Return policy for agent purchase risk evaluation | Medium priority |
5. The additionalProperty Field: The Most Powerful Schema Field for AI
additionalProperty deserves its own section because it is both the most powerful and the most underused schema field for agentic product discoverability. It accepts an array of PropertyValue objects, one per product attribute, turning every structured product attribute into a machine-readable, query-matchable declaration on your product page.
Full implementation example for a hiking jacket:
Each PropertyValue pair is independently readable by any AI system that crawls the page. An agent looking for packable products can read {"name": "Packable", "value": "Yes"} with certainty. It does not need to infer packability from description prose. This is the technical implementation of structured product data that agentic systems require.
Real-Time Data Architecture for Agentic Accuracy
Structural completeness is necessary but not sufficient. AI agents making purchase decisions need accurate, current data, specifically price, availability, and any time-sensitive attributes. The technical requirements for agentic data freshness:
| Data Type | Freshness Requirement | Technical Implementation |
|---|---|---|
| Price | Feed updated within 2 hours of any price change; schema.org price generated dynamically from live price field | Content API for real-time feed updates; server-side schema generation from database (not static template) |
| Availability | Feed and schema updated within 1 hour of stock depletion; preorder status reflected immediately | Inventory system directly triggers feed and schema update events; no batch delay for availability changes |
| Promotional pricing | sale_price and sale_price_effective_date populated in feed before promotion starts; removed immediately after | Promotion management system integrated with feed generation; time-based automation for sale_price management |
| Shipping time | shippingDetails schema reflects actual current dispatch capability, not aspirational lead times | Shipping time data sourced from live warehouse operations; dynamically updated when capacity or carrier changes |
The technical standard has three layers
Typed attributes
Correct data type, canonical value, and explicit units where relevant.
Machine-readable markup
product_details in feeds and additionalProperty in schema make attributes queryable.
Live sync accuracy
Price, availability, and delivery signals must stay aligned everywhere.
The AI-Ready Product Data Checklist — Technical Edition
Typed data first
If a purchase criterion lives only in prose, the system has to guess instead of query.
Markup makes it portable
Feeds help platforms. Schema helps crawlers and browser agents read the same truth.
Freshness protects trust
Accurate, current price and stock signals are part of the technical standard now.
Velou’s Technical Implementation Approach
Commerce-1 generates AI-ready product data at every layer described in this article simultaneously: typed attribute fields from source data, unit-explicit numeric values, canonical value normalization, product_details pairs for Google feeds, and schema.org additionalProperty arrays for PDPs.
The output is not just richer product content. It is technically structured product data that meets the format requirements that AI systems use for deterministic evaluation. For retailers who want to implement the technical spec described here across a catalog of thousands of SKUs, Commerce-1 is the operational mechanism that makes it achievable.
Implement the AI-ready technical standard across your catalog
Commerce-1 generates typed attributes, canonical values, product_details, and schema simultaneously.
Request a demo

.png)
.png)