The State of Product Data in Ecommerce: What the Numbers Actually Show in 2025

Pattern

Every year, product data gets talked about more and improved less. The gap between what ecommerce teams know about data quality and what they actually do about it has become one of the defining operational dysfunctions of the industry. In 2025, with AI agents actively evaluating product catalogs, that gap has started to have consequences that are both measurable and irreversible. This report synthesizes findings from catalog analyses, channel performance data, and commercial outcome research to give you an accurate picture of where the industry actually stands, and what separates the retailers who are pulling ahead from those who are falling behind.

54%
Average attribute completeness rate across mid-market ecommerce catalogs, meaning nearly half of all products are missing at least one purchase-criteria attribute.
£1.4T
Estimated annual global revenue impact of poor product data on retailers and brands, including lost sales, excess returns, and wasted ad spend.
1 in 5
Products in a typical Google Shopping catalog are actively disapproved, generating zero ad impressions despite live budget allocation.

Finding 1: Attribute Completeness Has Not Improved in Three Years

The most sobering finding in any catalog analysis is this: average attribute completeness rates in mid-market ecommerce catalogs have remained stubbornly at 50–60% for at least three consecutive years, despite growing awareness of the issue, improving tooling, and increasing channel pressure to improve.

Why has the number not moved? Not for lack of effort. Most ecommerce teams are doing some form of enrichment work. The problem is structural: enrichment projects close the gap, but the gap reopens. New suppliers add products with 40% attribute coverage. Channel requirements update and existing data no longer meets the new standard. Seasonal ranges launch under time pressure with the intention of completing the data later. The backlog compounds faster than any manual enrichment process can address it.

The teams whose attribute completeness rates have genuinely improved, and sustained that improvement, share one characteristic: they have built a continuous enrichment system, not an enrichment project cycle. The difference between those two approaches is the difference between a catalog that gets better over time and one that oscillates between recently enriched and degraded again.

The Completeness Plateau

A catalog that completes a full enrichment project and then returns to manual catalog management typically degrades back to its previous completeness rate within 12–18 months. The rate of new product additions, supplier data changes, and channel requirement updates exceeds the rate at which manual enrichment can maintain quality. The plateau is not a data quality limit. It is a throughput limit.

Finding 2: The Gap Between Hero Products and the Long Tail Is Widening

In virtually every catalog we analyze, there is a stark quality divide: the top 5–10% of SKUs by revenue have rich, complete, channel-optimized data. The remaining 90–95% have what the supplier provided, lightly edited at launch and rarely touched again.

This divide is widening for a structural reason: AI enrichment tools have made it faster and cheaper to enrich hero products, which already had the most manual attention. The marginal cost of adding another enrichment pass to a product that is already 80% complete is lower than the marginal cost of enriching a product that is 30% complete from scratch. So enrichment investment flows disproportionately to the products that need it least.

The commercial irony is significant. The long tail, the 90–95% of products that receive minimal enrichment attention, is where the highest-converting organic traffic lives. Long-tail search queries are more specific (higher intent), less competitive (easier to rank for), and more precisely matched to purchase-ready shoppers. A product with five highly specific attributes that a shopper has specified exactly will convert at 8–15% from that query. That same product with sparse data is invisible to it.

The most valuable organic traffic is flowing to the products with the worst data. That is the long tail paradox.
Catalog Segment Typical Attribute Completeness Typical Enrichment Attention Conversion Potential
Top 5% by revenue (hero products) 85–95% High, regular manual reviews, copy team attention, frequent updates Good but not always optimal, over-invested relative to incremental gain potential
Mid-tier 15% by revenue 65–80% Moderate, occasional reviews, covered in category enrichment projects Significant improvement available, relatively small data investment, meaningful revenue impact
Long-tail 80% by catalog volume 35–55% Low, supplier data at launch, rarely updated Highest opportunity per enrichment dollar, this is where organic traffic most undershoots potential

Finding 3: Google Merchant Center Errors Are a Chronic Condition, Not an Event

Pull the Merchant Center diagnostics for any mid-market Google Shopping account and you will find a pattern that repeats with near-perfect consistency: 10–20% of the catalog is in some state of error, the error count increases slightly during promotional periods, and the count has been in this range for months or years without systematic resolution.

The chronic nature of these errors reveals a systemic problem. Merchant Center errors are treated as a reactive support task, someone fixes them when they are noticed, rather than as a proactive data quality metric with ownership, targets, and a weekly review cadence. The consequence: a persistent 10–20% of ad budget is allocated to products that serve zero impressions, and that allocation persists indefinitely because nobody owns the number.

The most common error categories and their root causes in 2025:

Error Category Share of Total Errors Root Cause Resolution Approach
Price / availability mismatch ~35% Feed update lag behind website changes; static schema markup that does not reflect live prices; promotional prices not reflected in feed in time Real-time feed sync via Content API; dynamic schema generation; pre-promotion feed update testing
Missing required attribute ~25% New channel requirements not reflected in existing catalog data; new products launched without completing required fields Quarterly channel requirement audit; quality gate at product launch; AI enrichment for bulk attribute completion
Policy violation ~18% Prohibited terms in titles or descriptions (promotional language, superlatives, competitor references); image non-compliance Title quality audit against policy rules; image compliance review; automated policy violation scanning in feed tools
Invalid GTIN / missing identifier ~14% GTINs never sourced; private-label products submitted without GTIN and without identifier_exists=false declaration GTIN sourcing audit; GS1 registration for private-label; identifier_exists=false for genuinely unidentified products
Landing page quality ~8% Slow page load; mobile usability issues; content mismatch between feed and page; discontinued products still in feed Technical SEO audit; mobile optimization; feed management to remove discontinued products promptly

Finding 4: Amazon Listing Quality Is Concentrated at the Extremes

When we look at the distribution of Amazon listing quality scores across typical mid-market seller catalogs, the pattern is bimodal: a cluster of well-optimized products at the 75–90 range, and a cluster of neglected products at the 50–65 range, with the suppression threshold sitting at approximately 60–65. The middle range (65–75) is relatively underpopulated.

What this tells us: sellers have figured out how to do Amazon optimization well for the products they prioritize. The problem is the products they do not prioritize, many of which are sitting just above or just below the suppression threshold, generating minimal organic traffic and dragging down the overall organic contribution of the catalog.

The most commercially important observation: products between 60 and 70 on the listing quality scale represent the highest-ROI enrichment target in a typical Amazon catalog. Moving a product from 62 to 72 is often a matter of completing 3–5 missing recommended attributes and rewriting one thin bullet. The effort is minimal. The impact, moving from near-suppressed to above-average visibility, is substantial.

The 60–70 Zone: The Highest-ROI Amazon Enrichment Target

For most Amazon sellers, the highest-ROI enrichment investment is not further optimizing already-excellent listings. It is systematically moving the 60–70 zone products above 75, where they shift from below-average to above-average category ranking and begin generating meaningful organic traffic. A product at 72 outperforms a product at 65 by far more than a product at 92 outperforms one at 85.

Finding 5: Schema Markup Quality Has Not Kept Pace With AI Discovery Growth

The expansion of AI-powered product discovery surfaces, Google AI Overviews, Gemini Shopping, browser-embedded agents, has outpaced the improvement of schema markup quality on most ecommerce sites. In our analysis of DTC product pages across mid-market retailers, the typical schema implementation covers Product, Offer, and Price, but is missing aggregateRating (present on fewer than 45% of product pages), additionalProperty pairs (present on fewer than 15% of product pages), and shippingDetails (present on fewer than 25% of product pages).

The consequence: DTC product pages are systematically under-represented in AI discovery surfaces relative to their performance on traditional keyword search. A brand that ranks on page one for its core category keywords in traditional search may be nearly invisible in the AI Overviews and Gemini Shopping results that are increasingly commanding the top positions on the same results page.

Schema Property % of Mid-Market DTC Sites With Property Implemented Commercial Consequence of Absence Implementation Complexity
Product (basic) ~85% Without basic Product schema, no rich results eligibility at all Low, single JSON-LD block required
Offer (price + availability) ~78% Without Offer, no price or availability in rich results; Merchant Center feed-crawl conflicts likely Low, add to existing Product schema
aggregateRating ~42% No star ratings in organic search results; typically 15–30% lower CTR at same ranking position Low, dynamically generated from review data
shippingDetails ~24% No delivery time in agent evaluation; excluded from delivery-speed-specific queries Medium, requires structured delivery time data source
additionalProperty ~14% Products invisible to AI agent attribute queries on DTC pages; critical gap for agentic commerce readiness Medium, requires enriched attribute data to populate
hasMerchantReturnPolicy ~18% No return policy data for agent purchase-risk evaluation; lower confidence score in agent recommendations Low, static policy data; set once and maintain

Finding 6: The First-Mover Window for Agentic Commerce Is Open, and Closing

Agentic commerce content and agentic data readiness represent one of the last genuinely low-competition optimization opportunities in ecommerce. In Q1 2025, searches for “agentic commerce” generate virtually no competitive content. AI Overview product recommendations favor structured, attribute-complete listings, and the competitive bar for inclusion is still forming.

Within 18–24 months, this window will close. Every major ecommerce platform, every large retailer, and every ecommerce agency will have agentic commerce optimization as a standard service. The data infrastructure required for agentic commerce is the same infrastructure that improves performance on every existing channel today, making the investment dual-purpose and the cost of delay compounding.

The retailers building agentic data readiness in 2025 are not building for a future state. They are building infrastructure that improves current performance while simultaneously establishing a competitive position that will be difficult for later movers to close. That is the definition of high-leverage strategic investment.

The 2025 Product Data Imperative

The state of product data in ecommerce is simultaneously worse than it should be, with average completeness rates stuck in the 50–60% range, chronic Merchant Center errors, and schema implementations that lag AI discovery requirements by years, and more commercially actionable than it has ever been. AI enrichment platforms have removed the throughput constraint that made catalog-scale improvement economically unviable. The retailers who recognize this and act on it in 2025 will look back at this moment as the period when the catalog quality gap became a lasting competitive divide.

What the Best-Performing Catalogs Have in Common

In every channel and every category, the catalogs that consistently outperform on product data quality share four operational characteristics, none of which are technology purchases:

01

Named ownership for data quality metrics

Attribute completeness rate, feed approval rate, listing quality distribution, these are tracked metrics with a named owner. Not the responsibility of whoever notices, but of a specific role or team with a target and a reporting cadence.

02

A continuous system, not a project cycle

The highest-performing catalogs have moved from periodic enrichment projects to continuous enrichment operations. New products are enriched before launch. Existing products are monitored for decay. Channel requirement changes trigger targeted re-enrichment. The system runs continuously rather than waiting for the annual data cleanup project.

03

Quality gates at every launch point

No product goes live on any channel without meeting minimum data quality standards for its category. The gate is enforced systematically, not manually reviewed for each product, and the standards are documented, channel-specific, and reviewed when channel requirements change.

04

Enrichment prioritized by commercial impact

Enrichment work is sequenced by the revenue impact of the gap being closed, not by the easiness of the fix or the preferences of whoever is doing it. The filter attribute with the highest usage rate gets enriched before the attribute with low filter traffic. The Amazon product at 63/100 gets a priority fix before the product at 78/100 gets a further polish.

Metrics are owned, reviewed, and attached to targets.
Enrichment happens continuously, not as a cleanup burst.
Launch quality gates prevent low-quality data from entering the system.
Work is prioritized by commercial impact instead of convenience.

Velou’s Research Methodology

The findings in this report are synthesized from catalog intelligence analyses conducted across Velou’s client base, anonymised channel performance data from Google Merchant Center and Amazon Seller Central integrations, and published industry research from GS1, eMarketer, and Baymard Institute. Where specific percentages are cited, they represent observed averages across a representative sample of mid-market ecommerce catalogs with annual online revenue between £5M and £500M. Individual catalog performance varies significantly, which is why the first step in any data quality improvement program is a catalog-specific audit rather than benchmark comparison.

Benchmark your catalog against the findings in this report

Commerce-1’s catalog intelligence function generates your catalog health score across all six dimensions covered here.

Request a demo

See how AI-ready your catalog really is.