Skip to main content
RoastIQBuyerLensHugoPricingBlogAbout
Book a demoSign inStart free →
Field guide · updated 18 May 2026 · 14 min read

Creative analysis, written down
after 2,000+ ads.

Most creative advice is rumour. This is the opposite, every rule in here is anchored to a number, every number is anchored to an ad in a benchmark we keep growing. Read it linearly or jump to the chapter you need. Either way, leave with a methodology you can defend in a room of skeptical buyers.

2,047 ads analyzed 5 KPIs frozen ρ +0.31 held-out OOS 90s per ad
Oussama Nakhil portrait
Oussama Nakhil, Founder
Multiple years buyer-side: NielsenIQ insights, then L'Oréal Groupe in global marketing insights, where I managed Kantar as a vendor. Built SaliencyLab to replace the slow, expensive consumer research loop with something pre-spend, public-data, and honest about its ceiling.
1,200+
ads with public outcome data
6.5×
pool quintile lift (top vs bottom)
3
platforms validated (TikTok · YouTube · Meta-directional)

What creative analysis actually is.

Creative analysis is the structured assessment of an ad's likely performance before media spend, scoring it on attention, comprehension, branding, persuasion and memorability, then benchmarking against comparable ads. It is a pre-flight check, not a post-flight report.

Most teams confuse two very different things. Creative analysis happens before you pay for impressions; it asks "is this ad structurally sound?". Performance reporting happens after; it tells you what spend produced. The gap between them, the moment you have a finished cut and a media plan but no signal, is where 90% of waste is born, and where this guide lives.

Working definition

"A defensible read on creative quality, delivered fast enough to change the cut and cheap enough to do it for every variant."

Three things have to be true for a creative analysis to be worth doing: it has to be fast (less than the cost of a re-edit), benchmarked (a score with no peer is just a number), and honest (about its construct, its ceiling, and its blind spots). Every page in this guide is built around those three constraints.

The five KPIs. Frozen.

Every ad scored by SaliencyLab is decomposed into the same five dimensions. We do not add a sixth when a client asks. The composite is weighted, not averaged.

Weight 25%
Beat the Skip
Does the first 2 seconds resist the thumb? Pattern interrupt, motion, face, stakes.
Weight 20%
Get Noticed
Visual hierarchy, contrast, focal-point placement. Heatmap-driven.
Weight 20%
Brand Impact
Is the brand learnable in this exposure? Distinctive asset density and timing.
Weight 20%
Sell Proposition
Is the promise specific, single, and answered with proof?
Weight 15%
Build Brand
Tone, codes, and category fluency that compound across exposures.

The weights are not opinions, they are calibrated against held-out outcome data. Beat the Skip carries the most weight because on TikTok and Reels, an ad that loses the first 2 seconds loses the entire impression. Build Brand carries the least because its payoff is multi-exposure and harder to attribute in a single-ad scoring frame; we mark it as directional in solo-ad mode.

Three verdicts. No fourth.

Every scored ad lands in one of three buckets. We resisted the temptation to add nuance bands, three verdicts force a decision, five let you defer it.

// Decision ladder

≥70
SCALE
Composite ≥ 70 and no individual KPI below 55. Strong skip resilience, clear brand, defensible promise. Buy media.
→ Launch
55-69
SHARPEN
Composite mid-band. The ad has bones but one or two KPIs underdeliver. Re-edit the specific weakness; do not start over.
→ Iterate
<55
REBUILD
Composite < 55, or two or more KPIs < 45. The structural problem is upstream of the cut. Go back to brief.
→ Restart

When the thumb leaves, down to the second.

Of all the things 2,000+ ads taught us, hook timing was the loudest. The median scroll-stop on TikTok in our sample happens between 1.7 and 2.4 seconds, earlier than most creative teams assume, and brutally consistent across categories.

PlatformSample (n)Median scroll-stopHook windowPenalty if missed
TikTok7001.9s0 – 2.0s-22 pts
Reels (Meta)~4102.3s0 – 2.5s-18 pts
YouTube Shorts4032.6s0 – 3.0s-12 pts
YouTube In-Stream (skippable)~2805.1s0 – 5.0s-15 pts

The four hook moves that actually worked

  • Visual stakes on frame one. A face mid-expression, a hand mid-action, a result mid-reveal. Static product shots cost an average of 14 points on Beat the Skip.
  • Spoken promise before the brand. Ads that delivered the user's payoff line before the logo outperformed brand-first openings by ~28% on completion rate.
  • Pattern-break audio. Sub-bass drop, dialect shift, or unexpected silence inside the first second. Adds 6–9 points on average.
  • Caption-first composition. A short, contrast-heavy caption pinned to the top third, readable with sound off, scannable in under 600ms.

The seven patterns that kill ads.

Across 2,000+ scored creatives, the Rebuild bucket is dominated by the same handful of structural mistakes. None of them are about budget or talent. All of them are about what was decided in the brief.

Pattern 01
The two-headed brief
Two propositions fight inside a single ad. Sell Proposition drops below 45 and the viewer can not retell either one. Common in launches.
Hits ~31% of Rebuild ads
Pattern 02
Logo at the end, only
Brand cue arrives after the skip threshold. Brand Impact under 40 even when everything else is strong. Most expensive mistake in DTC.
Hits ~24%
Pattern 03
Inflated UGC
Polished studio production wearing UGC clothing. Build Brand collapses because nothing reads as native to platform.
Hits ~19%
Pattern 04
Skip-vulnerable opening
Slow establishing shot, voiceover preamble, or brand sting before the value. Beat the Skip below 50, everything downstream wasted.
Hits ~38%
Pattern 05
Proof without claim
Testimonials, stats, or social-proof badges with no specific promise to attach to. Sell Proposition stays mid-band, never crosses 60.
Hits ~17%
Pattern 06
Sound-on dependency
Joke, twist, or reveal that does not survive a muted scroll. Cuts Get Noticed by 11–15 points in feed environments.
Hits ~22%
Pattern 07
Generic-category opener
Opening frame indistinguishable from three competitors. The category gets the attention; the brand pays for it. Build Brand below 45.
Hits ~26%

Brand early, brand often, brand distinctively.

The single best predictor of Brand Impact in our dataset is not how often the logo appears, it is how early the first distinctive asset hits the frame, and how many distinct cues carry the brand beyond the wordmark.

First distinctive asset appears at…Median Brand Impact scoreVerdict mix
0.0 – 1.5s7162% Scale
1.5 – 3.0s5854% Sharpen
3.0 – 6.0s4448% Rebuild
> 6.0s or end-card only3171% Rebuild

"Distinctive asset" here borrows the Ehrenberg-Bass definition: any element a category buyer can map back to the brand without seeing the wordmark, color, character, sonic logo, packaging, recurring talent. The shape of the curve above is consistent across categories; only the absolute scores shift.

When to stop scoring and ask buyers.

A score tells you the ad is weak. A synthetic panel tells you which buyer is rejecting it and what they would change. The two tools belong in a ladder, not a menu, never run the panel without a score in hand.

  • If the verdict is Scale, ship. The panel will add nuance you do not need pre-launch.
  • If the verdict is Sharpen and the weak KPI is obvious, fix it, re-score, ship. No panel.
  • If the verdict is Sharpen and you can not see why, run a panel of 36 personas in the same project. The from-RoastIQ link is what makes the diagnosis specific.
  • If the verdict is Rebuild, the panel will tell you the brief is wrong, not the cut. Use it before re-briefing, not after.
House rule

"Synthetic Users never runs without a RoastIQ result. Every panel record carries the score it was triggered from. The decision history is the deliverable."

How we score. The honest version.

There is no proprietary black box here. The pipeline is documented, the model versions are pinned in every report, and the validation numbers are public.

// Inputs

Multimodal extraction

Vertex AI Gemini 2.5 Flash analyses the frame composition, attention map, brand cues, on-screen text and pacing. For video, Google Video Intelligence handles shot detection and labels; Speech-to-Text handles transcription. Saliency comes from a TranSalNet-class model.

// Scoring

Structured Zod output

Every KPI score is produced by a Zod-validated structured call, no free-text drift. The model returns the score, the confidence label, the decision trace and the recommendations as a typed record, stored against a pinned model_version and benchmark_pool_version.

// Validation

Public outcomes only

Validated against public engagement and click-intent signals, TikTok likes/shares/comments, TikTok CTR percentile, YouTube view counts. Never against customer-reported sales or ROAS. Held-out OOS Spearman ρ +0.30–0.32 across platforms; pool quintile lift 6.5×.

What we will not claim.

Half of being trustworthy is being clear about your ceiling. We will not say any of the following, and you should be skeptical of any vendor who does.

  • We do not predict sales lift, ROAS, attributed conversion, or brand recall. Those are downstream outcomes; we predict engagement and click intent, which are leading indicators.
  • We do not reproduce survey-based methodologies. We use behavioral proxies, not consumer recall surveys, different construct, different cost, different speed.
  • We do not claim Meta-Feed scores carry the same validation as TikTok and YouTube. Brand-ad outcome data is sparse on Meta; the scores are directional defaults until our cohort is dense enough to validate.
  • We do not claim heatmaps are measured eye tracking. They are predicted visual attention, useful as a directional read, not as a biometric.
  • We do not claim attribute detection is perfect. ~85% accuracy with a formal inter-rater paper in the research pipeline. We will publish the misses.

Creative analysis, asked plainly.

What is creative analysis, exactly?
A structured assessment of an ad's likely performance before media spend, scoring it on attention, comprehension, branding, persuasion and memorability, then benchmarking that score against comparable ads. It is a pre-flight check, not a post-flight report.
How long should the hook be on TikTok?
Median scroll-stop in our 700-ad TikTok sample is 1.9 seconds. If the brand promise, visual stakes or pattern interrupt is not on screen by ~2.0s, the ad is structurally weak, even if the rest is excellent. The penalty in our scoring model is about 22 composite points.
Why score with AI instead of running a survey?
Surveys take days, recruit small samples, and measure recall, not behavior. AI scoring takes 90 seconds, costs cents, and is validated against public behavioral outcomes (likes, shares, view counts, CTR percentile). Different construct, different speed, different cost. Both have a place; only one fits inside a creative sprint.
How accurate is RoastIQ?
Held-out out-of-sample Spearman ρ (2026-05-05): TikTok engagement +0.31 (n=700), TikTok CTR +0.30 (n=691), YouTube view counts +0.32 (n=403). Pool-wide quintile lift 6.5×. Meta-Feed scores are directional, outcome data for brand ads is sparse and we will not pretend otherwise.
When should I run a synthetic user panel instead of just scoring?
Score first. If the verdict is Sharpen or Rebuild and the weak KPI is obvious, fix it and re-score, no panel needed. If the verdict is mid-band and the cause is not obvious, run a panel inside the same project; the from-RoastIQ link is what makes the panel's diagnosis specific instead of generic.
Can I run creative analysis on a pre-edit storyboard?
Partially. The Beat the Skip and Sell Proposition KPIs scale down to storyboard inputs reasonably well; Get Noticed and Build Brand do not, they need rendered frames and the actual sonic and motion profile. Treat storyboard-stage scores as directional reads on structure, not on craft.
Do you support languages other than English?
Yes, Speech-to-Text and Gemini handle 40+ languages for transcription and comprehension. Benchmark density varies by market; we will tell you the sample size on every score so you can judge confidence.
What happens to my creative after I upload it?
It is stored in your private workspace, scored, and the asset is removed from staging after processing. It is never added to the public benchmark without explicit opt-in. If you opt in (the Score My Ad collaboration), the full breakdown is published as part of our editorial, that is how the benchmark grows.