Picture this: You’re researching a new coffee maker. You Google “best espresso machines,” browse Williams-Sonoma’s website, watch a YouTube review, then see a CTV ad for Breville machines before finally making a purchase. Traditional ad targeting treats each of these touchpoints as isolated events. Multimodal AI sees them as one connected journey — and that’s revolutionizing how advertisers find and reach audiences.

The multimodal difference

While everyone’s been obsessed with Sora 2 putting their faces into videos, the advertising world has quietly been building something more practical: AI that can understand any type of data and apply those insights across any advertising channel.

“What you need is prediction based on behavioral data, and that’s why we wanted to use multimodal AI for prediction in digital ad targeting,” explains Melinda Han Williams, Chief Data Scientist at Dstillery. “By multimodal, we mean you can use different modalities of data — not just different sources. The difference between looking at words and what they mean and looking at behaviors and how you’ve seen them strung together as sequences in the past.”

Think of it like how humans naturally combine visual cues, sounds, text and context to understand a situation. When you see a waiter slip on a wet floor near a caution sign, your brain instantly processes multiple inputs to grasp the full story. Multimodal AI aims for that same fusion of understanding.

From brief to campaign in one model

Here’s where it gets interesting for advertisers. Instead of building separate models for search, display, CTV and everything else, multimodal AI creates one unified understanding that works everywhere.

“What sets multimodal AI apart is in its flexibility,” said Taejin In, Chief Product Officer at Dstillery. “Whether you have rich first-party data or just a basic campaign brief, multimodal AI can transform virtually any starting point into precise, actionable audiences.”

The system can take first-party data, CRM lists, search keywords, website URLs, or even just a paragraph describing your target audience. It then builds a model that can activate across user segments, contextual targeting, private marketplace deals and custom bidding algorithms — all from that single input.

This new way to approach targeting doesn’t just simplify workflows for overworked advertising professionals, but it also delivers superior outcomes. For example, an automotive brand using a curated CTV deal created by multimodal AI saw a much higher 98% video completion rate compared to traditional ID-based targeting, while an auto insurance company with direct response goals surprisingly discovered that contextual targeting outperformed 12 other tactics in their campaign, including traditional id-based lookalikes.

Why this matters now and in the future 

Multimodal capabilities in digital ad targeting clearly provide a lot of value today – the application of more data to build more precise targeting, the ability to understand and reach audiences without relying solely on identifiers, and the ability to reduce vendor sprawl and simplify workflows.  

However, something that might not be obvious is how multimodal AI is foundational for tomorrow’s AI agents and agentic advertising systems. “For agentic AI and advanced AI agents to truly deliver superior value, they need to understand all modes of data; you can’t do that without multimodality,” In explains, “Imagine if an AI agent can only reason from contextual signals. It would be missing behavioral signals which we all know are a much better predictor of intent”

With a multimodal foundation, these agentic AI systems and protocols will be able to understand multiple modes of data (web behavioral, CTV, in-app, search, user segment, evaluate and make decisions in a shared space, and interact with external systems all with different modalities. 

The reality check

Multimodal AI isn’t magic. Success depends on having the right seed data and being willing to test and optimize. The most successful implementations happen when brands view it as a partnership, not a set-it-and-forget-it solution.

But for advertisers tired of managing dozens of disconnected models and tactics, multimodal AI offers a compelling alternative: one model that understands your audience across every signal and can activate that understanding anywhere in the programmatic ecosystem. In a world where consumer journeys zigzag across channels, that unified view isn’t just convenient — it’s essential.