Home  /  AI Case Studies  /  How Retail Brands Now Run on AI
Operations

How Retail Brands Now Run on AI

The operating system behind the world's smartest consumer companies β€” Walmart, Amazon, Unilever, Zalando β€” and how one small Brazilian fashion label runs the same playbook with a team of almost no one.

A modern retail operations desk at dusk β€” product samples, a laptop showing a clean dashboard, fabric swatches, and a coffee β€” calm, editorial, premium
The retailers winning in 2026 did not hire their way through the work. They rebuilt the work.

In March 2025, Walmart disclosed a number that should have changed the conversation about AI in retail and mostly did not. The company said it had used generative AI to create or improve more than 850 million pieces of catalog data — product titles, descriptions, attributes, images — and that doing the same work with people would have required roughly one hundred times the headcount it actually used.

Read that again. Not ten percent more efficient. Not twice as fast. A hundred times. A task that was previously impossible at any realistic budget became a Tuesday.

That is the shape of what has happened across retail and consumer brands over the last eighteen months. The headline-grabbing stories are about robots and cashier-less stores. The real story is quieter and far more consequential: the daily operational work of running a consumer brand — writing the catalog, answering the customer, producing the campaign, forecasting the inventory, translating the store — is being rebuilt around AI, and the brands that rebuilt first are pulling away.

The short version: The world's largest retailers and a growing number of one-person brands now run the same five operational layers — catalog and content, discovery and search, marketing and creative, customer service, and personalization — on AI. The difference between them is no longer budget; it is operating discipline. This briefing walks through each layer with real, sourced examples from Walmart, Amazon, Unilever, Zalando, L'Oréal, Shopify, and others, then shows how a small Brazilian fashion label called EZILDINHA runs the entire playbook with a team of almost no one — and what a serious operator should actually do about it in the next ninety days.

This is not a piece about whether AI matters in retail. That question is settled. It is a piece about the operating model — what gets done by the machine, what stays with the human, and how the line between them is drawn by the brands getting it right.

How big this actually is

Before the examples, the scale — because the numbers reframe everything that follows.

McKinsey’s 2025 State of AI survey found that 78% of organizations now use AI in at least one business function, up from 72% the year before, with 71% regularly using generative AI specifically. In the same body of research, McKinsey estimates the annual economic potential of generative AI in retail and consumer packaged goods alone at between $400 billion and $660 billion — the largest of any sector pairing it modeled.

The consumer side has moved just as fast. Amazon’s AI shopping assistant, Rufus, was on pace for roughly $10 billion in incremental annualized sales by its third-quarter 2025 earnings, a figure the company revised toward $12 billion for the full year — with 250 million users and shoppers who used Rufus reported as 60% more likely to complete a purchase. On the merchant side, Shopify reported that weekly active shops using its Sidekick assistant grew 385% year over year, and that roughly 42% of its merchants now use AI features in their stores.

And the work itself is changing hands. In Shopify’s marketing research, 60% of marketers now use AI tools daily — up from 37% a year earlier — 89% already use generative AI for content, and 83% report a direct productivity increase. These are not pilots. This is how the work is done now.

Numbers at that scale can numb rather than inform. So set them aside and look at the operation the way an operator does — layer by layer, by the work that has to get done before the store opens.


How retail got here: three waves

It helps to see the present moment as the third of three waves, because the brands that are confused about AI usually have the waves muddled together.

The first wave was the recommendation era — roughly 2010 to 2020. This was machine learning working quietly in the background: Amazon’s “customers also bought,” Netflix’s suggestions, the personalization engines that have long been estimated to drive on the order of a third of Amazon’s revenue. It was powerful but invisible and narrow. It optimized within a fixed catalog and a fixed storefront; it did not produce anything.

The second wave was the chatbot era — the rule-based and early-NLP customer-service bots of roughly 2016 to 2022. This is the wave that gave AI a bad name in retail: the rigid phone tree, the bot that could not understand a return. It promised to cut service costs and mostly cut customer satisfaction. Many operators who say “we tried AI and it didn’t work” are remembering this wave.

The third wave — the one we are in — is generative and increasingly agentic. It does not just rank or route; it produces (catalog, copy, creative, code) and increasingly acts (drafts the campaign, audits the flows, completes the task). This is the wave that made Walmart’s 850 million data points and Unilever’s 400-assets-per-product possible, and it is qualitatively different from the first two because it touches every operational layer at once, not a single function.

The confusion to avoid: judging the third wave by the second. The customer-service AI of 2026 is not the chatbot of 2018, and the operator who dismisses it on the strength of an old bad experience is making a category error. The right mental model is not “a better bot.” It is “a capable, fast, tireless drafter that needs supervision” — a junior team you brief and review, working across the whole operation.

The five layers where AI now lives in a retail operation

Every consumer brand, from a global conglomerate to a single founder on Shopify, runs the same five operational layers. What follows is each layer, what AI is actually doing inside it at real companies, and the judgment line that separates the brands doing it well from the ones generating expensive noise.

Layer 1 — Catalog & product content: the work that was never humanly possible

The catalog is the least glamorous and most quietly decisive part of a retail operation. Every product needs a title, a description, attributes, size and fit data, search keywords, and increasingly a localized version for every market. At any real assortment size, this is a wall of work that no team can fully scale — which is why most catalogs are, honestly, half-finished.

This is the layer where AI’s impact is least ambiguous. Walmart’s 850-million-data-points figure is the clearest example in retail of AI doing work that simply could not have been done with people — the company was explicit that the alternative was not a bigger team but no catalog enrichment at all. The same announcement noted fashion production timelines cut by up to eighteen weeks and a separate gen-AI assistant built for Walmart’s own merchants to speed sourcing and product-page work.

The pattern repeats at every size below Walmart. The fashion retailer Stitch Fix described, in its own engineering blog, an “expert-in-the-loop” system that could generate 10,000 product descriptions in about thirty minutes, each reviewed by a human in under a minute — and reported that in blind evaluation, the AI-generated descriptions scored higher than human-written ones. Zalando, the European fashion platform, said AI-generated content scaled from essentially zero to roughly 90% of its editorial imagery in a single year, cutting campaign creation from six weeks to a few days while increasing content output by 70%.

And at the small end, the same capability now ships free inside the tools a one-person brand already uses. Shopify Magic generates product descriptions, email and blog copy, and edits images directly in the admin; Shopify estimates its merchants using these features save on the order of 346 hours and roughly $8,700 a year. A French startup called Reversia, built natively on Shopify and running on Claude, translates an entire store — product copy, collections, metafields, SEO title tags, meta descriptions, hreflang and canonical tags — at 99% accuracy validated by native-speaker translators, turning a two-to-three-week localization project into minutes at roughly 70–80% lower cost than per-word translation.

The judgment line. Catalog work is the safest place to hand the machine real volume, because the failure mode is visible and cheap to correct — a wrong attribute, an awkward sentence. The brands doing this well still keep a human review gate, but they make it fast and structural: spot-check a sample, enforce a brand-voice rule set, and let the machine carry the volume. The mistake is treating catalog automation as “set and forget.” The discipline is treating it as “draft at scale, review by exception.”

Layer 2 — Discovery & the new shopping assistant: where the storefront is being rebuilt

For twenty years, retail discovery meant a search box and a recommendation carousel. In 2025 and 2026 it became a conversation. The shift is the single most expensive and most watched change in the industry, because it sits directly on top of revenue.

Amazon’s Rufus is the headline. A conversational assistant built on Amazon’s own large language model, it lets a shopper ask in plain language — “what do I need for a first camping trip with kids?” — and reshapes how products get surfaced. The reported numbers, again, are not marginal: on track for $10–12 billion in incremental annualized sales, 250 million users, monthly actives up 140% year over year, and Black Friday conversion roughly doubled among Rufus users.

Beneath Amazon, the same layer is being built by the platform millions of smaller retailers stand on. Shopify’s Sidekick — the merchant-facing assistant that answers “which three products had the highest margin last month?” in natural language and takes action in the store — runs on Claude (Sonnet 4.5). Shopify has been public that it upgraded its production model in under twenty-four hours and chose Claude for latency, resilience, and reasoning quality. In luxury, LVMH’s internal assistant “MaIA” now handles more than two million requests a month from 40,000 employees, from clienteling to semantic product discovery. In beauty retail, Sephora’s AI skin-diagnostic and assistant tools have been tied to a reported 35% higher conversion among users and a 25% reduction in skincare returns.

The judgment line. Discovery is where AI touches the customer directly, so the failure mode is no longer a typo — it is a wrong recommendation, a confident hallucination, a tone that does not sound like the brand. The brands winning here treat the assistant as a member of staff that must be trained and supervised, not a feature that ships once. They constrain it to real inventory and real policy, they monitor what it says, and they keep a human escalation path. The line is the same as in a good store: let the assistant greet, guide, and answer — but make sure it never invents an answer it does not have.

Layer 3 — Marketing & creative production: the assembly line goes synthetic

If the catalog is where AI proved it could do impossible volume, marketing is where it proved it could do impossible variety. The cost of producing a brand asset — a product photo, a localized ad, a fifteen-second video — has collapsed, and the largest consumer companies in the world have rebuilt their content supply chains around that collapse.

Unilever’s “Beauty AI Studio” is the clearest template. Across brands like Dove, TRESemmé, and Vaseline, Unilever reports producing roughly 400 assets per product where it used to produce about 20 per campaign, with production around 30% faster and both video-completion and click-through rates roughly doubling. Its Dove “Change the Compliment” work reached 700 million impressions with 94% positive sentiment in under thirty days — while the brand publicly pledged never to use AI to depict real women, a governance line that matters.

The pattern is industry-wide. Mondelez — Oreo, Cadbury, Milka — invested more than $40 million in an in-house generative tool built with Accenture, targeting a 30–50% reduction in marketing-content costs. Zalando cut content production from six-to-eight weeks to three-to-four days and reports campaign costs down roughly 90%. L’Oréal’s “CreAItech” lab takes campaign turnaround from weeks to hours and localizes creative across twenty EMEA markets. Even physical packaging has gone generative: Ferrero’s “Mio Nutella” campaign produced seven million uniquely AI-designed jar labels, and the run sold out.

None of this is free of risk, and the honest operator should say so plainly. Coca-Cola compressed a holiday campaign from roughly a year to a month using 70,000 AI-generated clips — and was widely criticized for an ad that looked, to many, like “digital slop,” with visible continuity errors. Mango and Levi’s drew “false advertising” and “artificial diversity” criticism for AI-generated models. Speed and cost are real. So is the brand-equity damage when the work is shipped without judgment.

The judgment line. Marketing is where the “draft at scale, review by exception” discipline matters most, because the output is the brand itself. The companies winning treat AI as the assembly line and the brand team as quality control with veto power — not the other way around. The mistake is letting volume become the goal. The discipline is keeping a human who can say “this is faster, cheaper, and not good enough to ship.”

Layer 4 — Customer service & post-purchase: the channel everyone wanted to automate

Customer service was the first place retailers tried to use AI and, for a decade, the place they did it worst — the rigid phone tree, the bot that could not understand a return. Large language models changed the economics, and the better operators have used them to raise the floor on service rather than just cut its cost.

The clearest example runs on Claude. Tidio, a customer-experience platform serving online retailers, built its “Lyro” AI agent on Claude and reports automating 71% of its own support, resolving more than two million conversations, with merchants able to handle up to 90% of inquiries automatically. One merchant, Belasante, generated more than $60,000 in additional revenue from a Claude-powered recommendation layer. Notably, Tidio did not cut its support workforce — it redeployed it. At enterprise scale, Best Buy uses a generative assistant for customer troubleshooting and agent coaching; Target rolled a “Store Companion” assistant to associates across roughly 2,000 stores.

The judgment line. Service is the layer where a confident wrong answer does the most relationship damage. The brands doing it well constrain the assistant to real policy and real order data, give it a clean handoff to a human the moment it is uncertain, and treat the savings as capacity to reinvest in the hard cases — not as headcount to eliminate. The line: automate the predictable, escalate the human, and never let the bot guess at a policy it does not actually know.

Layer 5 — Personalization, fit & returns: the quiet margin layer

The last layer is the one customers never see and the CFO loves most: matching the right product to the right person, and stopping the wrong product from coming back. Amazon’s recommendation engine has long been estimated to drive on the order of a third of company revenue. Zalando’s size-and-fit tooling is reported to avoid size-related returns by more than 8% and lift items-added-to-bag by 13%. Sephora’s diagnostic tools have been tied to a 25% reduction in skincare returns. In a category where returns can erase the entire margin on a sale, that is not a marketing nicety — it is the difference between a profitable order and a loss.

The judgment line. Personalization is where AI is most trusted to act with the least human review — and that is precisely the risk. A recommendation system optimizing the wrong metric will cheerfully push the most-returned product or the lowest-margin bundle. The discipline is owning the objective: deciding what “good” means — margin, lifetime value, return rate — and auditing what the system actually optimizes against it.


The Claude layer: why serious operators are standardizing on one model

Step back from the five layers and a question emerges that every operator eventually faces: which AI? The honest answer is that the frontier models are all capable, and the right tool depends on the work. But a clear pattern has formed among operators who care about judgment, confidentiality, and a controllable, reviewable workflow — the same qualities that matter in any serious professional setting. They standardize on Claude, made by Anthropic, for the work where the output carries the brand or touches the customer.

The evidence is in who builds on it. Shopify runs Sidekick — the assistant inside millions of small retailers — on Claude. Tidio carries retail customer service on Claude. In May 2026, Klaviyo and Anthropic launched an expanded integration that lets marketers pull live campaign and customer data into Claude and ask, in plain language, to audit flows, build weekly reports, and draft re-engagement campaigns — the kind of lifecycle-marketing work that, in Klaviyo’s words, “used to take hours” now drafted in minutes. Canva’s AI design features are built on Claude. And Anthropic’s own marketing team reports the kind of numbers operators should benchmark against: case studies drafted in 30 minutes instead of 2.5 hours, responsive search-ad creation cut from 30 minutes to 30 seconds, and roughly 10x creative output — built by a small, non-technical team.

The reason is not brand loyalty. It is that the work of a consumer brand is, increasingly, the same work The Leverage Years teaches senior professionals to do well: brief the assistant precisely, keep confidential data out, draft at the assistant’s speed, and apply a human review gate before anything ships or sends. Claude was built for exactly that posture — a capable drafter that defers to your judgment — which is why it keeps turning up underneath the operations that take their standards seriously.

A small fashion ecommerce studio in Brazil β€” a rack of elegant linen dresses, a laptop showing a product page, a phone on a tripod, warm natural light
A small brand running the same five layers as a global retailer — with a team you could count on one hand.

The case that should change how you think about size: EZILDINHA

Everything above is about companies with budgets most operators will never have. The more useful question is what happens when a small brand runs the same five layers. The answer is the most important strategic shift in retail, and it is best shown by a single example.

EZILDINHA is a Brazilian fashion label — elegant linen and silk dresses, kaftans, resort and occasion wear, sold direct to consumers online and through a small number of physical stores. By the standards of this briefing it is tiny. It has no media-buying department, no content studio, no localization vendor, no data-science team. And it runs all five operational layers on AI.

The catalog and content layer — product descriptions, collection copy, SEO titles and meta descriptions, structured data — is drafted with AI against the brand’s own voice rules and reviewed before publishing, exactly the “draft at scale, review by exception” discipline Walmart and Stitch Fix use, at a fraction of one person’s week. The marketing layer — the editorial blog, the campaign concepts, the email and social calendar — runs on the same engine. And the part most operators assume requires an agency, the paid-media and discovery layer, is where the leverage is most visible: EZILDINHA uses AI to research keywords, structure and optimize Google Ads campaigns, build and iterate Meta (Facebook and Instagram) ad creative and targeting, and tune its organic SEO and AI-search presence — the same functions that L’Oréal runs through Google’s AI and that thousands of brands run through Meta’s Advantage+, executed by a small team with an AI operating layer instead of a department.

This is not a one-off. It is the small-brand version of a documented pattern. Anthropic’s own customer story for ChatPlace — a platform that gives solo operators “an AI marketing team” on Claude — reports its creators saving 15 to 20 hours a week and seeing 15–40% revenue increases, with one operator describing it as “like I finally hired a team, except it’s just me and AI, and the AI already knows my business.” Reversia, also on Claude, lets a small Shopify brand translate its entire store into 110+ languages at 99% accuracy — turning international SEO, once a six-figure agency project, into a near-zero-marginal-cost capability. And the backdrop is a country-wide shift: Brazil’s small-business agency Sebrae found that 44% of Brazilian small businesses already use some form of AI, and 51% use generative-text tools — with OpenAI reporting Brazil among the top three countries in the world for ChatGPT usage.

The strategic point is blunt. The five layers that used to separate a global retailer from a kitchen-table brand — catalog, discovery, marketing, service, personalization — have become a software layer that any disciplined operator can run. The moat was never the work. It was the cost of the work. That cost just fell by an order of magnitude, and the brands that notice first will spend the next five years taking share from the ones that assumed their size protected them.

The discovery shift no retailer can ignore: AI search and zero-click

There is a sixth thing happening that sits underneath all five layers and changes the math on every one of them: the way customers find products is moving from search engines to AI answers, and the traffic economics are shifting with it. A retail operator who optimizes the five layers but misses this is rebuilding the engine while the road changes underneath the car.

The numbers are stark. Gartner has forecast that traditional search volume will drop 25% by 2026 as buyers move to AI chatbots and assistants, and that organic search traffic could fall 50% by 2028. Pew Research found that when Google shows an AI summary, the share of users who click any link falls to roughly 8%, down from 15% — and only about 1% click a link inside the summary itself. Similarweb data cited across the industry shows zero-click searches rising from 56% to 69% in a single year. The shopfront that depended on ranking in a list of blue links is being replaced by an answer that may never show a link at all.

The flip side is where the opportunity sits. Shopify reported that AI-referred traffic to its merchants grew roughly seven-fold and AI-attributed orders eleven-fold between early 2025 and early 2026, with ChatGPT accounting for the overwhelming majority of that LLM commerce traffic. Adobe measured retail traffic from generative-AI sources up more than 1,300% year over year over the holidays, roughly doubling every couple of months. And the traffic that does arrive from AI tends to be unusually well-qualified: Webflow, optimizing deliberately to be cited and to convert from AI discovery, reported that roughly 8–10% of its signups now come from LLM sources, growing about four-fold year over year, and that traffic from ChatGPT converts at around 24% — roughly six times its Google-search rate.

For a retail brand the implication is concrete: the catalog and content layer is no longer only about ranking in Google. It is about being the clean, structured, citable source an AI answer pulls from when a shopper asks “what’s a good linen dress for a summer wedding?” That is why the brands moving fastest are pairing classic SEO with what is now called generative engine optimization — structured data, clear answers, and content written to be quoted by a machine, not just skimmed by a person. It is the same discipline The Leverage Years teaches for professional content: write the thing that is precise enough to be cited.

This is also the door to agentic commerce — the near future in which an AI assistant does not just recommend a product but completes the purchase. Shopify and others are already building the rails for it. The brands with a clean, machine-readable catalog and a clear operating standard will be the ones those agents transact with. The brands with a half-finished catalog will be invisible to a buyer who never sees a search result again.

Three operations, up close

The layers are easier to understand through whole operations. Three short portraits — a conglomerate, a fashion platform, and a beauty house — show how the pieces fit into a working company.

The conglomerate’s creative assembly line

Unilever offers the clearest picture of marketing rebuilt as a system rather than a series of campaigns. Its Beauty AI Studio operates like a factory floor: a product goes in, and roughly 400 on-brand assets come out — stills, video cuts, format variants for every channel — against the 20 or so a traditional campaign produced. The company reports production roughly 30% faster with video-completion and click-through rates roughly doubling, and governs the whole thing with a framework it calls “Brand DNAi” plus a public pledge about how it will and will not depict people. The lesson for a smaller operator is not the scale; it is the structure. Unilever did not buy a tool and hope. It built a defined process — inputs, brand rules, review, output — and the AI is simply the engine inside it. Mondelez ($40M+ invested, targeting 30–50% lower content costs) and L’Oréal (campaign turnaround from weeks to hours across 20 markets) run the same play.

The fashion platform’s content engine

Zalando shows the same logic applied to the catalog-and-creative seam. By generating editorial and product imagery with AI — reportedly around 70% of its editorial images, with trend-response visuals produced in under 24 hours — it compressed content production from six-to-eight weeks to three-to-four days and cut campaign costs by a reported ~90%. Crucially, Zalando did not stop at production: it tied the content engine to size-and-fit and personalization tooling that reduces returns and lifts add-to-bag. That is the pattern worth copying — AI on the production side and the margin side of the same operation, not one without the other.

The beauty house’s clienteling layer

LVMH’s internal assistant, MaIA, handles more than two million requests a month from 40,000 employees — summarizing a client’s prior interactions before a sales associate walks the floor, drafting a personalized follow-up, surfacing the right product through semantic search. In a category whose entire promise is the human touch, the AI sits one layer behind the human, making the human better prepared rather than replacing them. Sephora’s diagnostic tools (reportedly 35% higher conversion, 25% fewer skincare returns) and EstΓ©e Lauder’s work with Google Cloud (an AI reported as twice as accurate as human operators at classifying customer calls) tell the same story: the AI does the preparation and the pattern-matching; the human does the relationship. That division of labor — machine prepares, human decides — is the through-line of every operation getting this right.

The EZILDINHA engine, function by function

Return to the small brand, because it is the proof that none of this requires a conglomerate’s resources. Here is how the same operations run at EZILDINHA, function by function, with an AI operating layer in place of a department.

  • Product & catalog. New arrivals get AI-drafted descriptions, collection copy, and structured data against the brand’s voice rules, then a human review before publishing — the small-brand version of Walmart’s catalog enrichment and Stitch Fix’s expert-in-the-loop model.
  • Organic search & AI discovery. Keyword research, on-page optimization, internal linking, and content built to be cited by AI answers — the generative-engine-optimization discipline that decides whether the brand exists in a zero-click world.
  • Paid media. Google Ads campaigns researched, structured, and optimized with AI, and Meta (Facebook/Instagram) creative and targeting built and iterated the same way — the same functions L’Oréal runs through Google’s AI and thousands of brands run through Meta’s Advantage+, executed by a small team.
  • Email & lifecycle. Campaign and flow copy drafted against the brand voice, segmented and scheduled — the small-brand analog of the Klaviyo-and-Claude lifecycle workflow now used by brands like Glossier and Liquid Death.
  • Internationalization. The capability Reversia demonstrates — translating the full store for new markets at near-zero marginal cost — turns international SEO from a six-figure agency project into a setting.

The point is not that EZILDINHA is unique. It is that it is repeatable. Brazil’s Sebrae found 44% of the country’s small businesses already using some form of AI; Anthropic’s ChatPlace and Reversia customer stories show the same model working for solo operators worldwide. The brand that builds this operating layer deliberately, with judgment in the loop, runs like a company several times its size.

The metrics that actually matter

It is easy to drown in vendor statistics. For an operator, only a few numbers decide whether AI is creating value or just activity. Track these and ignore most of the rest:

  • Hours reclaimed per week, by function. The cleanest proxy for leverage. ChatPlace’s solo operators report 15–20; Anthropic’s own marketing teams report 10+ per person on specific workflows. If a layer is not returning hours, the workflow is wrong.
  • Output per person. Assets, listings, campaigns, recaps produced per head. This is where the order-of-magnitude shifts show up — Walmart’s 100x, Unilever’s 20x assets, Zalando’s weeks-to-days.
  • Conversion and return rate. The margin layer. Personalization and fit tooling should move these or it is theater.
  • Share of discovery from AI. The leading indicator of the next two years. If AI-referred traffic and orders are not growing, the brand is invisible in the channel that is taking over.
  • Error and escalation rate. The honest counterweight. If the review gate is catching too much, the brief or the boundary is wrong — fix it before scaling.

Notice what is not on the list: number of prompts written, tools subscribed to, demos watched. Those are the vanity metrics of AI adoption. The operators winning measure leverage, output, margin, discovery, and error — and let everything else go.

The two failure modes — and the governance that prevents them

For all the upside, the brands that have stumbled did so in one of two predictable ways, and both are worth naming because they are entirely avoidable.

The first is shipping volume without judgment — the Coca-Cola problem. AI makes it trivial to produce a great deal of mediocre, off-brand, or factually wrong material very quickly. Google’s own Gemini Super Bowl ad cited a false statistic about cheese consumption and had to be quietly edited. The defense is not to slow down; it is to keep a human with veto power and a clear standard for what is good enough to carry the brand.

The second is letting the machine touch what it should not — customer data, confidential supplier terms, anything that would be a problem in the wrong hands. The mature operators have a written rule for this. Unilever governs its AI creative with a framework it calls “Brand DNAi” and a public pledge about depicting real people. The Leverage Years teaches the small-operator version of the same idea: a one-page “Never Upload List,” a sanitization rule set, and a senior review gate before anything leaves the desk. The principle scales from a global conglomerate to a single founder: decide in advance what the machine drafts, what stays with the human, and what never goes near the model at all.


The strategic stakes: why this is a five-year story, not a quarter

It is tempting to treat all of this as a productivity story — do the same work with fewer hours. That undersells it. The deeper shift is competitive, and it compounds.

For most of retail history, scale bought capability. A bigger brand could afford a content studio, a media-buying team, a localization vendor, a data-science group — and a smaller brand could not, so it competed on a narrower front. AI does not erase scale advantages, but it collapses the capability floor. The small brand can now run a credible version of every function the large brand runs. The large brand still has more reach, more data, and more budget — but it no longer has a monopoly on competence.

That changes who takes share. Over the next five years, the winners will not be sorted by size. They will be sorted by operating discipline — by which brands built a clear, written, repeatable way to let the machine carry volume while a human owns the standard, and which ones either ignored the shift or adopted it sloppily. McKinsey’s own data hints at the gap: in its 2025 survey, only a small fraction of organizations — roughly 6% — were capturing outsized value from AI, even as adoption became near-universal. Adoption is now table stakes. The value is going to the operators who are disciplined about it.

There is a second-order effect that matters even more for premium brands. As AI-generated content floods every channel, the scarce thing is no longer production — it is judgment, taste, and trust. The brands that win will use AI to remove the drudgery and then spend the reclaimed hours on the things a machine cannot do: the relationship, the point of view, the editorial standard, the decision about what is good enough to carry the name. That is the entire thesis of The Leverage Years, and it turns out to apply as cleanly to a fashion label as to a law firm. The tool removes the first-pass burden so judgment has more room to matter.

The objections operators raise — and the honest answers

Smart operators do not adopt this on faith. They raise objections, and the good ones deserve straight answers.

“Won’t AI content make my brand sound generic?” It will, if you let the machine set the voice. It will not, if you set the voice and the machine drafts to it. The brands that sound generic skipped the brand-voice rule set and shipped the default. The ones that sound like themselves wrote down how they sound, fed it to the model, and edited every output through a human who knows the difference. Generic is a governance failure, not a property of the tool.

“Isn’t this risky with customer data?” It is, if you have no rule for it. The mature operators — from Unilever’s Brand DNAi to a solo founder’s one-page Never Upload List — decide in advance what data never goes near a model, what gets masked, and what is safe. Anthropic’s own small-business research found data security was the single biggest hesitation owners cited; the answer is not avoidance, it is a written boundary and a tool posture that keeps you in control of what is shared.

“The numbers are all from vendors. How do I know they’re real?” A fair challenge, and the honest answer is that many of the figures in this briefing are vendor- or brand-reported and should be read as directional. But the most consequential ones are not: Walmart’s catalog productivity, Amazon’s Rufus sales, the Pew and Gartner data on search behavior, McKinsey’s economic modeling. And the failures are independently reported too — Coca-Cola’s backlash, the Gemini ad error. The balanced read is that the capability is real and proven; the value is uneven and goes to disciplined operators; and any single vendor statistic should be treated as a claim to verify, not a fact to repeat.

“If everyone has this, where’s my advantage?” In the operating system, not the tool. Everyone can buy the same models. Almost no one builds a clean operating standard, writes it down, and runs it consistently across functions. That discipline — not access to AI — is the durable advantage, exactly as it has always been with any general-purpose technology.

Build, buy, or operate?

Operators reasonably ask whether they should build custom AI, buy a point solution, or simply operate the general tools well. For all but the largest brands, the answer is the third, and it is worth being clear about why.

Building custom AI — training models, engineering pipelines — is what Walmart, Amazon, and Unilever do because they have the scale to justify it. Buying point solutions — a dedicated ads tool, a dedicated translation app like Reversia — makes sense for a specific, high-volume function where the tool is genuinely better than general-purpose work. But the foundation, for almost every brand, is learning to operate a capable general model well across the whole business: brief it precisely, govern what it touches, draft at its speed, and review with judgment. That is the layer that ties the point solutions together and the layer most brands skip in their rush to buy something.

This is the gap The Leverage Years exists to close. The tools are commodity. The operating discipline — the rule sets, the review gates, the written workflows, the calm decision about what stays human — is the asset. A brand that has built it can adopt any new tool quickly, because it already knows how it wants the work done. A brand that has not will keep buying tools and wondering why the leverage never quite arrives.


What a serious operator should actually do in the next 90 days

Reading about Walmart’s 850 million data points is not a plan. Here is the sequence we’d give an operator who wants real leverage by the end of a quarter, without hiring anyone or betting the brand.

Weeks 1–2: Map the five layers and pick one

Write down, honestly, how each of the five layers runs today — who does the catalog, who answers the customer, who makes the campaign, who tunes the ads, who owns personalization. Then pick the one layer where the work is most repetitive and the failure mode is cheapest. For most brands that is catalog and content. Do not try to transform five layers at once; transform one and learn the operating discipline.

Weeks 3–5: Build the operating standard before the volume

Before generating anything at scale, write three things: a brand-voice rule set (how the brand sounds, what it never says), a Never Upload List (what data never goes near the model), and a review gate (the short checklist a human runs before anything publishes or sends). This is the difference between leverage and liability. It takes a few hours and it is the part most brands skip.

Weeks 6–9: Run the layer at scale, review by exception

Now use the layer in production. Draft the catalog, the campaign, the recaps at the machine’s speed; review a sample and the exceptions, not every line. Track one number — hours reclaimed, or output produced, or return rate — so you know whether it is working. The goal of the first layer is not perfection; it is a repeatable operating loop you trust.

Weeks 10–13: Add the second layer and write it down

With one layer running cleanly, add the next — usually marketing or customer service. Critically, document the workflow as an operating manual a new hire could run: the prompts, the rules, the review gate. A brand that has written its AI operating system down has built an asset. A brand that keeps it in one founder’s head has built a dependency.

That is the whole method, and it is deliberately unglamorous. The brands winning in retail did not find a magic prompt. They built an operating system — a clear, written, repeatable way to let the machine carry the volume while a human owns the standard — and then they ran it, layer by layer, until it became how the company works.

The human layer that does not disappear

For all the talk of automation, the most striking thing about the brands getting this right is how central the human remains. Tidio redeployed its support team rather than cutting it. LVMH put MaIA behind the sales associate, not in front of the customer. Unilever pledged that AI would not depict real women. Stitch Fix kept an expert reviewing every description. In every operation that works, the machine took the volume and the human kept the judgment — and the human’s judgment became more valuable, not less, because it was now the scarce input in a process where production had become cheap.

This is the part the breathless coverage gets wrong. The future of a consumer brand is not a fully automated company with no people. It is a small number of people whose judgment, taste, and relationships are amplified by an operating layer that handles the drudgery. The founder who used to spend her week writing product descriptions and wrestling with ad campaigns now spends it on the things only she can do — the buy, the brand point of view, the customer relationship, the decision about what is good enough to carry the name — while the machine drafts the rest to her standard.

That is why the brands that treat this as a cost-cutting exercise tend to disappoint, and the brands that treat it as a leverage exercise tend to pull away. Cost-cutting asks “how few people can we run on?” Leverage asks “what can these people now do that they never had time for?” The first question produces a thinner version of the same brand. The second produces a brand that punches several weight classes above its size — which is exactly what EZILDINHA, ChatPlace’s solo operators, and a growing number of disciplined brands are quietly doing right now.

The retailers winning in 2026 did not hire their way through the work, and they did not fire their way to a margin. They rebuilt the work — one layer at a time, with a clear standard and a human in the loop — and then they spent the reclaimed time on the things that actually build a brand. The tools are available to everyone reading this. The discipline is the only thing in short supply, and it is entirely learnable.

Frequently asked questions

How are retail brands actually using AI in daily operations?

Across five operational layers: catalog and product content (Walmart used generative AI to create or improve 850 million catalog data points); discovery and shopping assistants (Amazon's Rufus is on pace for $10–12 billion in incremental sales; Shopify's Sidekick runs on Claude); marketing and creative production (Unilever produces ~400 assets per product, Zalando cut content time from weeks to days); customer service (Tidio resolves 2M+ conversations on Claude); and personalization, fit, and returns (Sephora reports 25% fewer skincare returns). The common pattern is that AI drafts and carries volume while a human owns the standard.

Can a small brand really compete with large retailers using AI?

Yes — that is the central shift. The five operational layers that used to separate a global retailer from a small brand are now a software layer any disciplined operator can run. EZILDINHA, a small Brazilian fashion label, runs catalog, marketing, paid media, SEO, and operations on AI with a tiny team. Anthropic's ChatPlace customer story reports solo operators saving 15–20 hours a week and seeing 15–40% revenue increases. The barrier was never the work; it was the cost of the work, and that cost has fallen by roughly an order of magnitude.

Which AI model should a retail or ecommerce brand use?

The frontier models are all capable, and the right tool depends on the task. But operators who prioritize judgment, confidentiality, and a controllable, reviewable workflow tend to standardize on Claude, made by Anthropic, for work that carries the brand or touches the customer — which is why Shopify's Sidekick, Tidio's support agent, Canva's AI design, and Klaviyo's marketing integration are all built on Claude. The more important decision is not the model but the operating standard: how the work gets briefed, what data is kept out, and the human review gate before anything ships.

What are the risks of using AI in retail, and how do brands avoid them?

Two failure modes account for most problems. The first is shipping volume without judgment — Coca-Cola's AI holiday ad and Google's Gemini Super Bowl ad both drew criticism for quality and accuracy. The second is letting the model touch data it should not. Mature operators prevent both with governance: a brand-voice rule set, a clear list of what data never goes near the model, and a human review gate with veto power. The discipline is “draft at scale, review by exception” — never “set and forget.”

How should a brand start using AI without a big budget or a tech team?

Pick one of the five layers — usually catalog and content, where the work is repetitive and the failure mode is cheap. Before scaling, write three things: a brand-voice rule set, a Never Upload List, and a short review checklist. Then run that one layer in production, reviewing a sample and the exceptions rather than every line, and track one number. Add a second layer only once the first runs cleanly, and document the workflow so it becomes an operating asset rather than a dependency on one person.

Is AI in retail just hype, or are the results real?

The results are real and, in places, dramatic — but they are uneven. McKinsey estimates generative AI's annual potential in retail and CPG at $400–660 billion, and named-brand results (Walmart's 100x catalog productivity, Amazon's multi-billion-dollar Rufus sales, Unilever's ~55% content cost savings) are documented. But there are also public failures, and many vendor-reported numbers should be read as directional. The honest summary: the capability is proven; the value goes to operators with the discipline to apply it well, not to those who simply buy a tool.

Anthony Guerriero is the founder of The Leverage Years and a CPA and former Deloitte Senior Manager. He built and scaled a medical logistics company from 6 to 1,800 employees and has advised UHNW clients on cross-border real estate transactions across more than 40 countries. The Leverage Years teaches senior professionals and operators how to use Claude, made by Anthropic, to do their best work faster without compromising their judgment or professional standards.

Take it further

Get the workflow + SOP.

The full briefing prompt and the matching SOP page from the binder are inside the Club. Free with any course, or $49/month direct.

Open the Club β†’
If this matches your work

Find the right course.

Six diagnostic questions, one course recommendation. We will point you at the program out of twenty that maps to the work in this briefing, then send your workflow assessment.

Take the selector β†’