The wrong customer for AI video
HeyGen, Synthesia, Higgsfield, Runway, Sora — what they're actually optimized for, and why advertising is still waiting for the right tool.
The video models got good in 2026.
I mean really good. Sora 2 will hold a face across a 20-second clip with synchronized audio. Veo 3 produces hero shots that pass for finished commercials. Runway Gen-4 References will lock a sneaker, a bottle, or a brand mascot across six cuts in a project. Kling 2.1 will render a 12-second chase scene that, four years ago, would have cost $40,000 and a stunt coordinator.
So why are 95% of the AI video ads in your feed still bad?
It's not the models. The models are not the bottleneck. The bottleneck is that none of the dominant AI video products are built for advertisers — and the ones that gesture at it are built for the wrong kind of advertiser.
This is a Field Note about who AI video tools are actually for, what advertising actually needs, and the gap between those two facts.
What the incumbents are actually selling
Let's go down the list.
HeyGen sells corporate avatars. Pick a stock person, type a script, get a talking head in 40 languages. The wedge is sales enablement and L&D — videos that need to ship in a quarter without a production crew. Their roadmap optimizes for avatar fidelity, lip-sync accuracy, and language coverage. None of that is a performance-ad axis.
Synthesia is HeyGen's older, more enterprise sibling. Same shape — corporate explainers, multilingual training, internal comms. Roadmap optimizes for compliance, security, IT-admin features. Excellent product. Not built for someone shipping 50 ad variants per week.
Higgsfield is the new entrant for cinematic motion. Drag a still image, pick a motion preset (orbit, dolly, push-in), get a clip that looks like a movie shot. Their wedge is content creators who want film-grade aesthetic without a crew. Aesthetic is the product. Performance creative is often the opposite of aesthetic — it's intentionally rough, hook-first, native-looking, and ugly when ugly works.
Runway is the model studio for filmmakers. Music videos, narrative shorts, VFX-adjacent storytelling. Gen-4 References is genuinely the best identity-lock layer in the market right now — we use it heavily for physical-product shots — but Runway's center of gravity is making one polished thing, not 50 variants of one ad.
Sora (OpenAI) is harder to categorize because it's a general-purpose model. But the vibe — the demos, the showcase reel, the partnerships with film studios — points squarely at cinema. It happens to ship a cameo system that's the gold-standard identity lock for actor-driven ads, but ad workflow is not what's marketed. The cameo feature is a side-effect that advertisers can exploit, not a use case the product team is steering toward.
Look at the pattern. Filmmakers. Corporate buyers. Content creators. "Creators." That's the customer base AI video products fight over. None of them are performance media buyers.
You'll notice none of the names above pitch themselves as ad tools. The set of products that did is shorter and worth its own line.
Icon, Arcads, Creatify, AdCreative.ai, and the rest of the 2024-25 "AI ads" wave actually pointed at the right customer. The pitch was the cleanest of the group — AI-generated UGC, direct-to-platform workflows, aimed at DTC media buyers.
The honest thing to say about that wave is this: anyone with reasonable technical fluency and a Claude Code session can spin up outputs at roughly that quality themselves now. The video models are openly available; the prompts aren't magic. What you're actually paying these tools for is orchestration — wrapping the model layer in a usable workflow.
Orchestration is real value. The question is which orchestration. Most of these tools ship generation orchestration — a workflow for producing videos. The kind of orchestration ads actually need closes the loop with the platforms: it takes what won, what died at second three, which audience converted, and feeds it back into the next batch.
Without that feedback loop, even a well-built orchestrator is shipping open-loop volume. A skilled marketer can close the loop manually — read the data, draw the right inferences, brief the next round — but the tools that hold the data (the ad platforms) and the tools that do the generation are separate products that don't talk to each other. That gap is what keeps the wave from compounding. Volume without feedback doesn't get better. It just gets bigger.
What advertising actually needs
Performance advertising is a fundamentally different game than any of the above, in five specific ways.
1. It's a portfolio search, not a craft
Motion analyzed 550,000 Meta ads across $1.3B of spend last year and found that roughly 5% of ads absorb the majority of spend in any account. Half of all ads receive almost no spend at all. This isn't a sign of bad creative. It's the structure of the system. Performance creative behaves like probability search, and the only way to win it is to ship enough quality variation that the algorithm finds your winners.
This means the unit of work for an advertiser isn't "an ad." It's "a portfolio of 30-50 ads that share a hypothesis, with enough variation that some will spike." Tools that help you make one beautiful video are missing the point. You don't need one beautiful video. You need 40 — where the 5% that work cover the cost of the 95% that don't.
2. Cross-shot identity has to lock
Multi-cut ads — Founder POV, Caught Slacking, UGC Testimonial, Podcast Clip — only work if the same actor appears in beat 1, beat 3, beat 5, looking like the same human. The model layer that does this is called identity lock, and there are exactly two viable approaches in 2026: cameo (Sora 2 — register a person once, render them into anything) and character lock (Runway Gen-4 References — provide reference images and the subject persists across shots in a project). If you don't pick the right approach for your beat, the actor visibly drifts mid-ad and the audience subconsciously notices. Most AI video tools don't expose this decision at all. They treat it as a model implementation detail. It is the load-bearing decision.
Identity lock deserves its own Field Note. That one is coming next.
3. Format is structural, not stylistic
Performance ads have winning formats. Letter (handwritten note, single static shot) hits 10.8% on Meta's hit-rate dataset. Sign (offer-first banner) hits 7.9%. Founder POV hits 8.6%. These aren't aesthetic preferences. They have specific cut counts, hook taxonomies, and pacing. A Letter ad has 0-1 cuts; a Caught Slacking has 6-8. If you generate a Letter with eight cuts, it stops being a Letter — it becomes a different ad, and the format's hit rate doesn't transfer. Models don't know any of this. The screenplay layer above the model has to.
4. Offer type changes the whole pipeline
A SaaS app needs a composite step — render the human holding a flat-green phone, layer the real UI on top in post — because video models hallucinate UI text into garbage. A physical product needs a multi-angle reference set (4-8 images of the bottle, sneaker, or device) so the model knows what your specific thing actually looks like. A service has no physical artifact at all and needs an outcome-visualization strategy entirely. None of this is a model feature. It's a routing layer above the model that decides which tool, which reference type, which post-step. No incumbent ships this layer because their core customer doesn't ask for it.
5. The platform is the scoreboard, and the scoreboard lies
Every performance marketer optimizes against CTR because that's what the platforms surface. But CTR is corrupted as a creative-quality proxy — a 3% CTR with junk traffic loses to a 1.5% CTR with strong CVR every time. The right scoreboard is CTR × CVR (or CPA), and the entire stack — creative tooling, platform reporting, agency dashboards — is misaligned with that fact. The next Field Note unpacks this in detail.
Why the gap exists
The incumbents aren't oblivious. They're correctly serving the customer their wedge gave them. HeyGen's customer wants 40-language consistency, not 50 hook variants. Higgsfield's customer wants cinematic polish, not a Sign ad shot in eight seconds. Runway's customer wants narrative coherence, not portfolio-volume ad ops.
The gap is that performance advertising is a third kind of customer — high-volume, intentionally rough, ruthlessly metrics-driven, format-aware, identity-locked, offer-routed — and no one with the model relationships and the engineering org has been incentivized to serve them yet. Probably because performance media buying is unglamorous work that doesn't show up in sizzle reels. Filmmaking does. So the orgs that have the model partnerships chase the glamorous customer, and performance ads are stuck retrofitting filmmaker tools.
This is the gap that gets closed in 2026.
What you should do this quarter
If you want to make AI video ads today and you don't want to wait for someone to build the right product, here's the honest playbook.
Pick a format before you pick a model. Decide if you're shipping a Letter, a Sign, a Founder POV, or a UGC Testimonial. Each has a target cut count and a target identity-lock approach. The format chooses the model, not the other way around.
Use Sora 2 cameos for any beat with a recurring actor. Register the actor once. Now you can put them in 50 ads without re-uploading reference images, and they'll look like the same person every time.
Use Runway Gen-4 References for any beat with a recurring physical product. Multi-angle reference set, 4-8 images. Don't trust the model to know what your specific bottle, sneaker, or device looks like. It doesn't.
Use Veo 3 only for single-shot polish where identity doesn't need to lock across cuts. Hero shots, ASMR, Letter ads. It will not lock a face across multiple cuts. It cannot — it has no reference-image input at all. Sending Veo 3 a reference image is throwing money away.
For SaaS, plan a composite step. Render the actor holding a flat-color phone. Composite the real UI on in post. Models cannot render readable UI text yet, full stop. Anyone who tells you otherwise is either lying or hasn't looked at the output text closely.
Lint before you render. Every render is real money. Don't ship a 30-second uncut shot. Don't lead with a logo. Don't hand-write a sales script and read it to camera. There are five or six format-violating anti-patterns the model can't catch but the platform's algorithm absolutely will.
Generate volume and expect to lose 95% of it. That's the structural game. Tools that make you feel good about the one ad you finished are working against you.
What we're building
None of the above should be a manual checklist. It should be the default behavior of the product you ship through.
I've been building this with the Orbit team for the last several months. Internally we call it Hydra — an AI ad studio designed specifically for performance advertising rather than for filmmakers, corporate L&D, or "content." It includes:
- A screenplay layer above the model. JSON beats, dialogue, wardrobe, props, cuts — so you can lint a generation request before you spend on a render.
- A format-aware prompt rewriter that knows a Letter has 0 cuts and a Caught Slacking has 6-8, and refuses to ship a render that violates the format's hit-rate prerequisites.
- An identity-lock router that picks Sora 2 cameo, Runway Gen-4 References, or Veo 3 text-only depending on what your beat actually needs.
- An offer-type router (
app,physical,service) that makes composite steps, multi-angle reference sets, and outcome-visualization the default rather than something you have to remember. - Hydra-mode hook discovery: ship 30+ variants overnight, identify which angle catches, then re-render the winners on the premium tier.
Hydra is not public yet. We're shipping it to a small group of agencies and operators first, later this quarter. If you're a performance marketer who wants in early, reach out — I'm at hello@orbitllm.com.
The models will keep getting better. The question is whether anyone builds the operating system for them.
— Jose