Agentic commerce – where a software agent with a goal and a budget completes the transaction for a human who never loaded your page – is quietly dismantling the web stack, the data layer, attribution, KPIs, and traffic detection all at once. Across the last five Knowledge Distillation episodes, Sani Manić made the case for unglamorous fundamentals, Josh Silverbauer diagnosed what instant checkout does to measurement, Simo Ahava called time on tag-manager-as-duct-tape, Tim Wilson threw cold water on the urge to re-instrument, and John Lovett mapped agentic traffic detection and GEO.
Full series: www.ask-y.ai/knowledge-distillation-podcast
Is the web ready?
Sani (ep. 8), unsurprisingly, says no. Semantic HTML, accessibility, the unglamorous foundations – back on the critical path.
“Make boring sexy again.”
Plot twist: the GA4 cohort who survived BigQuery is accidentally the best-prepared for what’s next.
How do you measure a purchase with no session?
Josh (ep. 9) on instant checkout – no pixel, no journey, just an order appearing in the back end. Agentic measurement is construction, not migration.
“We are now relying on machines to analyze machines.”
What about the data layer?
Simo (ep. 14) is done with tag-manager-as-duct-tape. Agents need structure by design; GTM won’t patch this for you.
“We can’t get by with that complacency anymore.”
Warning from the same breath: the internal resource fight between agent-optimized and human-optimized UX is about to get ugly.
Should we even measure the same things?
Tim (ep. 10) with his signature response to everyone currently scoping new instrumentation projects:
“Every company out there has enough data.”
His practical suggestion before you re-instrument a thing: write an agent and see if it can buy on your own site. Hypothesis first.
Human or agent?
John (ep. 15) on the rebuild-for-bots question. Seer’s position is uncomplicated:
“Build for the humans – the robots will figure it out.”
Field note from his team: humans do clicky-clicky scrolly-scrolly; agents execute surgical strikes. Log files are back in rotation. And the 5,000-prompt Olympics GEO study yielded exactly one firm conclusion — anyone claiming to have GEO figured out is lying.
What this means for experimentation
There’s a specific aftertaste worth naming: agentic commerce is quietly repricing the experimentation discipline. The funnel we’ve spent a decade optimizing – hero image, value prop, urgency cue, social proof, CTA copy – is invisible to an agent that’s parsing our structured data and ignoring everything else. When the buyer is a model, conversion rate optimization becomes token-space optimization. That’s not a rebrand; it’s a different craft, with different primitives.
A few practical consequences, pulled across the five episodes:
Your test cells are about to get noisy.
If agentic traffic grows the way John’s early numbers suggest – single-digit percentages, but accelerating – your experimentation platform is unknowingly mixing humans and agents into the same variant cohort. Agents behave deterministically: same model, same page, same action, every time. That flattens effect sizes, suppresses variance, and makes your lifts look smaller than they are. Until your platform can segment agent traffic out, every result you ship is slightly diluted. Filtering by user-agent and behavioral fingerprint (the clicky-clicky scrolly-scrolly test) is probably the first thing to retrofit.
Log files are the new session recordings.
Sani, Josh, and John arrived, independently, at the same practical advice: raw server logs are where the signal lives. Expect a new best-practice layer – something between a bot-filter and a segmentation tool – to settle between log files and GA4. A handful of vendors are already shipping early versions; most analytics stacks will grow one organically over the next eighteen months.
Testing moves upstream.
Tim’s write-an-agent audit isn’t a one-time thing – it’s a recurring test. Same structured data, different models. Does Claude find your SKU? Does Perplexity? Does ChatGPT’s shopping agent actually complete the transaction, or does it get stuck on a non-semantic form field you last touched in 2019? Testable today, one developer, an afternoon. Bake it into your release checklist the way you bake in Lighthouse scores.
The hypothesis backlog changes character.
Fewer we believe a red CTA will lift click-through by 3%, more we believe our product pages are invisible to Claude’s shopping agent because we’ve buried the price in a JS render. The unit of optimization keeps moving up the stack – from pixel, to component, to schema, to context window. The old skills still work; the canvas is just larger.
Ultimately, it’s the same game: form a hypothesis, test it, ship what works. The variables just got more interesting.