Why agentic workforce beats dev shops for marketing.
The three models everyone is being pitched.
Every growth-stage Chief Marketing Officer (CMO) I talk to this quarter is reviewing the same three decks. The logos change. The pitches do not.
The dev shop pitch
"We will build you a custom Large Language Model (LLM) pipeline." They show an architecture diagram with boxes labeled retrieval, vector store, orchestration. Their case studies are fintech dashboards and logistics portals. One slide mentions marketing. The team is six engineers and a salesperson. They have shipped zero campaigns.
The traditional agency pitch
"We use AI across all deliverables." They show you the same strategy deck they showed you in 2021, with three new slides inserted: a ChatGPT screenshot, a Midjourney moodboard, and a bullet point that says "we use AI to accelerate production." The billable-hours rate card has not moved. The operating model has not changed. The word "agent" appears in exactly one footnote.
The "AI-powered" agency pitch
"We have an innovation team." They show you a custom Generative Pre-trained Transformer (GPT) they built for one of their clients. It is a wrapper around GPT-4 with a system prompt and a logo. They call it proprietary. They do not have a repo they can show you. When you ask which team members write Skills day-to-day, the account manager says she will "loop in the innovation lead" and you never hear from that person.
The fatal flaw in each one.
The pitches sound plausible. The first engagement exposes the flaw.
Dev shop: ships capability, not operation.
Their success criterion is "the tool works." Your success criterion is "the marketing works." These are different problems. You end up with a SKILL.md, a README, and nobody who knows what to feed it on Monday morning. The dev shop logs off at go-live. Someone still has to operate the capability against real marketing problems.
Traditional agency: bolts AI onto a 2018 operating model.
Their cost structure is billable hours. Agents threaten that model, so they under-invest in them. What gets built is a ChatGPT prompt your account manager pastes into the system field of an OpenAI chat window. That is not an agent. That is a copy-paste workflow with extra steps. The output is mediocre because nobody on staff has engineering discipline.
"AI-powered" agency: uses the same public prompts as everyone else.
No custom Skills, no custom data integrations, no speed advantage. When the prompt stack gets stale in six months, they are behind. When Anthropic or OpenAI releases a new model, they are reshuffling slide decks instead of refactoring workflows. Their "AI" is marketing copy on a sales page.
The fourth model.
The fourth model is a marketing agency that has already automated its own repetitive workflows. You hire the workflows, not the vendor-of-the-week AI pitch.
It keeps editorial judgment at the human layer and lets Skills handle the repetitive work. It uses MCP to connect those Skills to data sources and client workspaces. It delivers output on a cadence the hourly-billing model cannot support. And it has shipped live against real clients before pitching you, not as a demo.
This model is rare. Here is what it takes to pull off.
- Senior strategists who understand marketing and can operate AI-heavy workflows without being scared of a Claude Code terminal.
- Operators who can build and maintain Skills and Model Context Protocol (MCP) connections at production quality.
- Editorial and Quality Control (QC) discipline at the human layer. Skills draft. Humans judge.
- A business model that rewards shipping deliverables, not logging hours.
- Willingness to publish methodology. Execution is the moat, not the process doc.
Cross-disciplinary staffing is the blocker. Most agencies are staffed with creative directors and account managers. Most dev shops are staffed with engineers and product managers. Putting both under one roof takes deliberate design. It is easier to pick one side and pretend the other does not matter.
A worked example: the GEO tech audit Skill.
Abstract arguments convince nobody. Here is a Skill we run today at /winston, called winston-geo-tech-audit.
1. The job it does
The Skill audits a domain against a 24-point Generative Engine Optimization (GEO) technical rubric. Crawlability for AI user agents, schema coverage, chunk architecture, entity signals on /about and byline pages, canonical and robots behavior, and citation presence across the major AI engines. It produces a ranked fix list scored by dimension.
2. What it replaces
Roughly 4 to 6 hours of a senior Search Engine Optimization (SEO) practitioner's time per audit. Previously a human would crawl the page, check schema in Google's Rich Results Test, read the content for chunk readiness, run citation checks in Perplexity and ChatGPT, and write up findings. Now the Skill does the mechanical parts and the human reviews the output.
3. How we deliver it
We run the Skill on the client domain. Output is a structured report with per-dimension scores, a ranked backlog, and a short summary suitable for a weekly digest. The client gets the Excel workbook plus a baseline readout.
4. What it produces
A per-dimension score set (crawlability, schema, chunk, entity, citation), a ranked fix list keyed to impact-to-effort, and a delta report when we rerun the audit next quarter.
5. What it cannot do
Final editorial judgment on rewrites. That stays with humans. The Skill can say "this H2 is pronoun-dependent and scores 0." A senior editor decides how to rewrite the section so it reads naturally. The Skill flags. The human fixes.
6. Why this shape beats a "we'll build you an agent" pitch
Because the Skill already exists. It has run against real client domains. The output format is fixed. The cost to run it against your domain is known. You are not funding the construction of a capability. You are hiring the output of a capability that already ships.
Multiply this by the four other workflow categories (keyword research, on-page optimization at volume, brand-voice content pipelines, GEO tracking and reporting) and you have a workforce. Not a roadmap.
The vendor scorecard.
Same four vendor archetypes. The axes are about workflow output, not agent-building capability. The extra column tells you what to check for.
| Axis | Traditional agency | AI-powered agency | Dev shop | /winston | What to check for |
|---|---|---|---|---|---|
| Ships keyword research in hours, not weeks | No | Sometimes | N/A | Yes | Ask for a sample keyword universe delivered in under a week. |
| Optimizes on-page at scale (hundreds of pages) | No | Limited | N/A | Yes | Ask for a sample optimization sheet for 50+ pages. |
| Monitors AI citation on a schedule | No | No | No | Yes | Ask which engines they track and how often. |
| Covers Reddit, TikTok, YouTube, and LLMs | No | No | No | Yes | Ask what scrapers they use and what dashboards they deliver. |
| Publishes methodology | No | No | Rarely | Yes | Look for Playbooks with implementation detail, not thought leadership. |
| Your data in workspaces you control | Varies | Varies | Varies | Yes | Ask where outputs live and who has access. |
| Operates Skills against live clients today | No | Unclear | N/A | Yes | Ask to see a real output from a recent engagement. |
How to evaluate pitches in your next RFP.
Seven questions. Ask every vendor. The answers will sort your list in about 15 minutes.
- Show me a real Skill output from a recent engagement. Not a demo.
- Which team members operate the Skills day-to-day? Name them.
- What does the monthly output bundle look like for a client in my vertical?
- Where does our data live, and who can access it?
- Which AI engines do you track for GEO, and how often?
- Do you eat your own cooking? Show me your internal workflow stack.
- What happens in the review cadence if a workflow stops earning its keep?
Any vendor who cannot answer those is the wrong vendor.