What creative analysis actually is.
Creative analysis is the structured assessment of an ad's likely performance before media spend, scoring it on attention, comprehension, branding, persuasion and memorability, then benchmarking against comparable ads. It is a pre-flight check, not a post-flight report.
Most teams confuse two very different things. Creative analysis happens before you pay for impressions; it asks "is this ad structurally sound?". Performance reporting happens after; it tells you what spend produced. The gap between them, the moment you have a finished cut and a media plan but no signal, is where 90% of waste is born, and where this guide lives.
"A defensible read on creative quality, delivered fast enough to change the cut and cheap enough to do it for every variant."
Three things have to be true for a creative analysis to be worth doing: it has to be fast (less than the cost of a re-edit), benchmarked (a score with no peer is just a number), and honest (about its construct, its ceiling, and its blind spots). Every page in this guide is built around those three constraints.
The five KPIs. Frozen.
Every ad scored by SaliencyLab is decomposed into the same five dimensions. We do not add a sixth when a client asks. The composite is weighted, not averaged.
The weights are not opinions, they are calibrated against held-out outcome data. Beat the Skip carries the most weight because on TikTok and Reels, an ad that loses the first 2 seconds loses the entire impression. Build Brand carries the least because its payoff is multi-exposure and harder to attribute in a single-ad scoring frame; we mark it as directional in solo-ad mode.
Three verdicts. No fourth.
Every scored ad lands in one of three buckets. We resisted the temptation to add nuance bands, three verdicts force a decision, five let you defer it.
// Decision ladder
When the thumb leaves, down to the second.
Of all the things 2,000+ ads taught us, hook timing was the loudest. The median scroll-stop on TikTok in our sample happens between 1.7 and 2.4 seconds, earlier than most creative teams assume, and brutally consistent across categories.
| Platform | Sample (n) | Median scroll-stop | Hook window | Penalty if missed |
|---|---|---|---|---|
| TikTok | 700 | 1.9s | 0 – 2.0s | -22 pts |
| Reels (Meta) | ~410 | 2.3s | 0 – 2.5s | -18 pts |
| YouTube Shorts | 403 | 2.6s | 0 – 3.0s | -12 pts |
| YouTube In-Stream (skippable) | ~280 | 5.1s | 0 – 5.0s | -15 pts |
The four hook moves that actually worked
- Visual stakes on frame one. A face mid-expression, a hand mid-action, a result mid-reveal. Static product shots cost an average of 14 points on Beat the Skip.
- Spoken promise before the brand. Ads that delivered the user's payoff line before the logo outperformed brand-first openings by ~28% on completion rate.
- Pattern-break audio. Sub-bass drop, dialect shift, or unexpected silence inside the first second. Adds 6–9 points on average.
- Caption-first composition. A short, contrast-heavy caption pinned to the top third, readable with sound off, scannable in under 600ms.
The seven patterns that kill ads.
Across 2,000+ scored creatives, the Rebuild bucket is dominated by the same handful of structural mistakes. None of them are about budget or talent. All of them are about what was decided in the brief.
Brand early, brand often, brand distinctively.
The single best predictor of Brand Impact in our dataset is not how often the logo appears, it is how early the first distinctive asset hits the frame, and how many distinct cues carry the brand beyond the wordmark.
| First distinctive asset appears at… | Median Brand Impact score | Verdict mix |
|---|---|---|
| 0.0 – 1.5s | 71 | 62% Scale |
| 1.5 – 3.0s | 58 | 54% Sharpen |
| 3.0 – 6.0s | 44 | 48% Rebuild |
| > 6.0s or end-card only | 31 | 71% Rebuild |
"Distinctive asset" here borrows the Ehrenberg-Bass definition: any element a category buyer can map back to the brand without seeing the wordmark, color, character, sonic logo, packaging, recurring talent. The shape of the curve above is consistent across categories; only the absolute scores shift.
When to stop scoring and ask buyers.
A score tells you the ad is weak. A synthetic panel tells you which buyer is rejecting it and what they would change. The two tools belong in a ladder, not a menu, never run the panel without a score in hand.
- If the verdict is Scale, ship. The panel will add nuance you do not need pre-launch.
- If the verdict is Sharpen and the weak KPI is obvious, fix it, re-score, ship. No panel.
- If the verdict is Sharpen and you can not see why, run a panel of 36 personas in the same project. The from-RoastIQ link is what makes the diagnosis specific.
- If the verdict is Rebuild, the panel will tell you the brief is wrong, not the cut. Use it before re-briefing, not after.
"Synthetic Users never runs without a RoastIQ result. Every panel record carries the score it was triggered from. The decision history is the deliverable."
How we score. The honest version.
There is no proprietary black box here. The pipeline is documented, the model versions are pinned in every report, and the validation numbers are public.
// Inputs
Multimodal extraction
Vertex AI Gemini 2.5 Flash analyses the frame composition, attention map, brand cues, on-screen text and pacing. For video, Google Video Intelligence handles shot detection and labels; Speech-to-Text handles transcription. Saliency comes from a TranSalNet-class model.
// Scoring
Structured Zod output
Every KPI score is produced by a Zod-validated structured call, no free-text drift. The model returns the score, the confidence label, the decision trace and the recommendations as a typed record, stored against a pinned model_version and benchmark_pool_version.
// Validation
Public outcomes only
Validated against public engagement and click-intent signals, TikTok likes/shares/comments, TikTok CTR percentile, YouTube view counts. Never against customer-reported sales or ROAS. Held-out OOS Spearman ρ +0.30–0.32 across platforms; pool quintile lift 6.5×.
What we will not claim.
Half of being trustworthy is being clear about your ceiling. We will not say any of the following, and you should be skeptical of any vendor who does.
- We do not predict sales lift, ROAS, attributed conversion, or brand recall. Those are downstream outcomes; we predict engagement and click intent, which are leading indicators.
- We do not reproduce survey-based methodologies. We use behavioral proxies, not consumer recall surveys, different construct, different cost, different speed.
- We do not claim Meta-Feed scores carry the same validation as TikTok and YouTube. Brand-ad outcome data is sparse on Meta; the scores are directional defaults until our cohort is dense enough to validate.
- We do not claim heatmaps are measured eye tracking. They are predicted visual attention, useful as a directional read, not as a biometric.
- We do not claim attribute detection is perfect. ~85% accuracy with a formal inter-rater paper in the research pipeline. We will publish the misses.