Dstillery Predicts Episode 1: The 2018 Election

Dstillery Predicts Episode 1: The 2018 Election

Can the same data used by brands to discover new customers and drive growth also help political candidates discover new supporters and drive voter turnout? In our series Dstillery Predicts, we intend to find out. 

In this first installment of our Dstillery Predicts series, Caroline Allen, Marketing Manager, talks with Peter Lenz, Senior Geospatial Analyst, and Peter Ibarra, Senior Analyst, from our Data Science and Analytics team. They have been working to answer some really interesting questions on how the data that we use everyday for brands can be applied to the political realm. Who does our data show will win the March 13th Special Election in Pennsylvania's 18th Congressional District? Listen to our podcast to find out!

Subscribe:

Apple Podcasts      Soundcloud

DS Without The BS Sports Sponsorships

Caroline Allen:  

Welcome to Dstillery Predicts, a new series from Dstillery, a predictive marketing intelligence company. I'm Caroline Allen, the marketing manager, and I'll be bringing you direct access to our team of AI, machine learning, and media experts. Today we're talking with Peter Lenz, our Senior Geospatial Analyst, and Peter Ibarra, one of our Analysts from our Data Science team. They have been digging into some really interesting questions around the 2018 election and how we can use our data that we use every day for brands, how that can be applied to the political world. Thanks for joining us today.

Peter Ibarra: 

Yeah, of course.

Peter Lenz:

Absolutely.

Caroline:  

Do you want to give us a little bit of background about Dstillery. I know this isn't the first time we've taken the dive into using our data in the political world. How did that get started.

Lenz:

Sure. Actually, Dstillery's role in politics began before we were even called Dstillery. One of our predecessor companies, EveryScreen Media, was in fact the primary programmatic advertising company for the Mitt Romney campaign. We learned a lot about how to do political media targeting at that time. Fast forward a couple years later and in 2016, Pete and I worked on a project where we took a look at devices that showed up at Iowa’s caucus locations. The Iowa caucus is really important. It's traditionally the very first indication of who is really going to be a player in a presidential election cycle. People show up at places like town halls, gymnasiums, and schools, and they literally talk it out with one another about who they're going to support. That means they stand around for a long time. One of the true-isms that we understand here at Dstillery is that when people are standing around waiting, outcome their devices. We mapped all of these locations, we took a look at the devices that showed up at them during the caucus time.

One of the great things about it is that the Democrats caucus separately from the Republicans, so we were able to separate out those signals very cleanly, and we were actually able to build profiles of which people support, which candidates, and that was our first deep dive into using our crafted audience technology for politics. Today what we want to do is go even deeper. That was a first look, "How can we do this?" Now we want to go in-depth. Instead of a huge presidential election using lots and lots of data, we're actually going to take a look at one specific district, and not just tell you who supports who, but actually try and predict who is actually going to win this race.

Ibarra:  

What we wanted to do for the Pennsylvania 18th, and this is a special election that's going to be occurring on March 13th, and we wanted to take some of the ideas that we developed previously and really, like Lenz was talking about, really take them deeper. Let's look at peoples content and let's start using content as a predictor of who they're likely to support and using that data as a way to understand how likely one person's going to win over the other in this election.

Lenz: 

It's really no different than what we do every day for brands, but applied to politicians, because what else is a politician, but a kind of brand?

Caroline: 

Exactly. It's no different from a marketer trying to sell their products, a campaign manager or a campaign is trying to sell a person.

Ibarra:

Yeah, exactly, and usually with brands you're going to be in a very competitive environment where you're looking at dozens of potential companies trying to get that same person. What's unique about the political realm is it's going to be two people fighting over that same type of ... Really, instead of customer, it's a voter, and they're going to be fighting over that. It makes it really easy, or easier, I should say, to start distinguishing what draws people and what behavior draws into one candidate over the other.

Lenz:

What makes it more difficult is that in marketing, people are always buying products every day. In politics, you have one ... We would call it a conversion of that where somebody makes a decision. There's only one conversion event that matters. When we were typically building models, we use previous conversion events of other devices to predict and to learn what makes a person make that decision. Politics you don't get that. Every time an election happens it's different from every other election. There's no real training data the way we would consider it to build a model. It's on one hand it's very similar to what we do every day. On other hands it's slightly different.

Caroline: 

And something else that we do every day though is we not only can expose brands to new audiences, but we can actually use our data to determine in some cases consumer behavior, and if they're actually going and going to the store and purchasing the item that they were exposed to the online display ad, for example. Can you use data to not only help drive political affiliation, but drive potentially turn out on election day?

Ibarra:

Yeah, I think that's where our data can be the most helpful for people. One of the things that we realized when looking at this is every day a candidate's out there trying to get their name out there. That's their whole life is, the campaign. They're doing events, they're doing calls, they're knocking on doors. Every day that's what they do. What we, our data was showing was even though that's what a candidate is doing all the time, when it comes to the constituency that they're going after, it's 1% of that constituents overall behavior. Maybe the politicians whole life it's very much not the voters whole life. It's really a small slice. What our data is able to do is say, "Look, you have a good graph on what's going on with that 1%, let's show you the other 99% of things going on and what motivational triggers in that 99% can motivate people to come out and vote." "Well, how can I reach this group of users?" "By understanding the issue that matters to them, and it will motivate them to come out and vote." That's where I think our data's going to be really helpful.

Lenz:     

Especially in a campaign like this one. A presidential campaign is absolutely a massive thing and it dominates peoples minds leading right up to an election. This is a special election and there much smaller. People, even for one that's important as this one and as getting as much coverage as the PA team is, it's not that big. People aren't thinking about it enough. You have to be very sophisticated to hone in on that, hone in on that relatively small signal that your voters are sending you.

Caroline:    

For our listeners who are most likely unfamiliar with what's happening in Pennsylvania right now, can you give a little bit of a brief background on the PA-18 election that's happening on Tuesday?

Ibarra:     

Yeah. What happened is the seat was vacated previously by a GOP congressman and when that happens they're going to run a special election to fill that seat. For the republican party they're running Rick Saccone as their nominee. On the democratic side it's Conor Lamb. It's become a hot button issue because this is a district that Trump had won by over 20 points in 2016. They are, a lot of people are gauging this as, "How much are people starting to understand and really like or dislike the Trump presidency and the things that he's doing," which is why it's become this national race and very much like a measuring stick over what's going to happen in November.

Caroline:

Let's talk a little bit deeper about this research that you did with the PA-18 special election.

Ibarra:        

Yeah, so usually what we've done and what we did a little bit differently here when trying to develop a support model to understand how likely the people living in the Pennsylvania '18 would be to support one over the other. Traditionally we always, and I think Lenz would agree, we've always built a democratic model and try to understand how likely people are to be a democrat. We then built a republican model and try to understand how likely that is. I think one of the things we found is that there's a high overlap, because when it comes to people that are likely to support one party or the other, they're politically evolved.

Lenz:   

Our models always found people who are politically minded, period, instead of being able to tell us who is supporting which party.

Ibarra:   

Basically what we did is we built out a model that is going to score people on a scale of GOP to democrat. It's like a zero to one scale. If you were closer to zero, you're a likely GOP member, if you're closer to one, you're a likely democratic supporter. What we found when we ran this model previously or for the first time, looking at all the devices that we see honed in the Pennsylvania '18 is that we find that over 55% of devices lean on the republican side. It's a district that's gone republican for the last 20 plus years and that's very much what our model has shown, that the democrats are playing with far fewer people that are going to help them carry the election, unless they're able to swing more of that in. They, it's very much a republican leaning district and when we are starting to run and as we got closer to the election, what we see is that on the extreme ends of our model, that the democrats have actually done a much better job of bringing people to a higher score, a higher level of support and garnered that enthusiasm, and that we think it's starting to out-perform the republicans in that sense.

Republicans have the 55% likely support, but they haven't done as good of a job of getting those people to be as enthusiastic about Saccone or as Conor Lamb has done a good job of bringing that democratic base out and be really supportive of what he's trying to do.

Lenz:     

To do this actual analysis, we broke out every device that we saw in this district into quintiles to a highly republican quintile, leaning republican neutral. Those are your true independent voters, lean democrat, highly democrat. When we take a look at those two extremes, the highly republican and the highly democratic quintiles, we see that democrats are out-performing republican enthusiasm roughly by 9%. Considering that this is a slightly republican leaning district, that puts your democrats just a little bit higher than the republicans when we consider who's going to turn out on Tuesday.

Caroline: 

You feel pretty strongly that you have a prediction of what's going to happen on Tuesday.

Lenz:     

We do. It is our belief that Conor Lamb will win by a very small margin. Republicans have the numbers advantage, but we believe the democrats have the enthusiasm in the issues advantage.

Ibarra:

One of the things that we really found on the democratic side is a large index on a lot of the women's issues. We believe that the women's vote could be a really key factor in this upcoming election. We see a lot of things of child care, women's health, whether that be Planned Parenthood or healthcare in general, we also saw as a really big issue. People are worried about whether or not they're going to be able to maintain their coverage and whether or not they're going be able to have the either, whether it's pre-K for their kids or the type of schooling that they want. This is something that we think is going to be a really determining factor if can he, can Conor Lamb really capture that type of momentum that he's been able to do and to help carry him and basically use the women's vote as a way to get him into office?

Lenz:    

The converse is Saccone's supporters. For them the really interesting content that we're seeing, the highest ranking content clusters in our system, largely relate to guns. Guns, we see people who are active investors. There's a lot of financial interest in, on the republican side. What's really interesting though, when you look at these financial related cluster is that they're highly concentrated on the republican side. We don't really see them on the independent voters or on the democrat side. The way we're reading that is the signature republican legislation of the last year was Trump's tax cut. It seems that if that's only playing to the republican base, instead of growing the republican brand, which is what they need to do in this district, it played to their base. Basically the only people who give a hoot, let's put it that way, about this tax cut are the people who were already going to show up and vote republican. It didn't do anything. It didn't move the dial at all for them.

Ibarra:      

Yeah. I think that's one thing that's interesting. We would have expected that as this tax bills implemented and as people are starting to see a little bit more income on their paychecks, that enthusiasm that we're not seeing on the republican side should have probably happened as a result of that tax bill. I think Lenz and I have been looking at the data. It's very much a part of the GOP base that what's going to probably turn out anyways. It hasn't done anything to try and bring more people in as a way to continue to support the agenda that's being implemented. That was a bit surprising for us, because it's been such a huge push from the GOP party.

Lenz:         

If I were going to be giving the GOP a bit of advice right now about what they should actually be doing if they want to grow their brand ... What people are concerned about in this district are healthcare. We see a lot of healthcare related things popping for this district. One of the things that's really surprising is looking at senior care content, but not senior care from the point of view of seniors, but senior care from the point of view of children who are looking at ways to take care of their elderly parents. This is a very ... I think there's a lot of compassion related things that the GOP can do that they're missing out on by making this focus on the tax cut, which quite frankly has fallen flat.

Ibarra: 

Yeah, I agree. It's something that whether it's healthcare ... It's like how can ... These people are concerned about how they are going to be able to maintain that quality of life, that they're not going to have to go and worry about, "Can I afford my insurance whether it's for myself, whether it's for my kids, or my parent's that are getting older?" They're looking for those types of services that are going to help make the other burdens, I think, a little bit easier in their lives.

Caroline: 

We talked a lot about how campaigns can use data on their constituents, but how do you actually know where people are?

Lenz:          

Well, that's a very difficult problem for us, because we don't collect PII. PII is personal, potentially identifiable information. Data that's actually about a person can be linked to a name or an address, all those things that you would traditionally use to figure out where someone is. We don't get any of that. Every device that enters Dstillery's ecosystem is a bunch of randomly generated numbers. There's no names, there's no addresses. We have to use the data that comes into our system to make ... The data science ... Another word, probabilistic. Probabilistic, the no BS version of probabilistic is an informed guess. We use math to do those informed guesses. We use our statistical models. One of our technologies is called homing. We collect location data and we attach the histories of our devices. We also use our crosswalk, which is a device graph that lets us connect devices together probabilistically.

For those people who are going to be listening to this, we're going to use the word probabilistically a lot in these conversations. We love Bayes’ law. We take this data and we basically look for patterns. We look for recency, how many times ... Where have you been most recently? We take a look at frequency, how many times you've been to a place and that works both ways. If you've been to a place very few times, we're not going to home you there. If you've been to a place lots and lots and lots of times that's also something that we consider suspect. That might be a Starbucks or your job, actually, not actually your house. There's a sweet spot in between and I can't tell you what it is, because the machine is constantly modifying that number to find the right answer to that is. We take a look at times. People tend to follow certain schedules and those schedules by the way change in different places. In New York City it tends to be nine to five and in a factory town you might have shift work where people are working around the clock.

Our system takes all this information into account and it makes a best guess off of those anonymous histories as to where a device lives. We do that down to a very fine level. We do that down to the level of a zip plus four. People know standard issue five zip codes. They've often seen four additional numbers after that. Those four additional digits generally correspond to between 12 and 20 households. If you look at a city, roughly a city block. It's not a perfect analogy, but a good one to get in your head. We can hone you down to a group of those in plus four, represent a relatively small part of the world. In using that we can make a guess as to where you live. You live somewhere very close to that zip four. It might not be that exact zip four, but zip fours are really, really tiny, and it's probably one of the ones nearby it.

Caroline: 

What's next? Tuesday happens, the ... We're going to see the election results live Tuesday night. What's the next thing you're going to work on?

Ibarra:

After the election what we want to do is take those results and use them as a way to better inform the models and the methodology that we've been doing. We think we are onto something, but we also know that it's the first step of what we hope to be a process where we can go and do a lot more of this type of predicting in November and really using this as a way, really as a stepping stone to what we want to do of predicting the entire national election in 2018, whether every district, every Senate race, and it would be great if we do that.

Lenz: 

We're scientists at Dstillery and we have formed a hypothesis. It's a little scary to form a hypothesis on this, because it's very easily provable or disprovable, but I don't think ... I think a scientist is always happiest when they're proven wrong, because it means that there's more to learn, that there's more to explore. We've highly instrumented this election. We have a lot of things in our system set up to capture data. Just like Pete said, we're going to take that data and we're going to use it to build out, make our models better. This is the first time we were actually calling and election and we're going to use it to build new models. We are especially excited about tackling a turn-out model. What we have right now is an enthusiasm model. We understand across the entire district what people are interested in. That's what we do every day. That's our standard Dstillery brand focus technology. We've added this democrat, republican model, so we understand people who, who people are supporting.

The third part of this and the crown and glory is a turnout model. We can predict on election day who's actually going to vote. Between those three things you can start putting together highly accurate understandings of an electorate. We're excited, because this is a chance to collect that election data. There's only so many elections that are going to happen between now and November, and this is a chance to collect hard data that we can actually use to train our models.

Caroline:     

You are both confident in the data, that the data shows a democratic win on Tuesday for Conor Lamb, but what happens if you're wrong?

Ibarra:   

The first thing I'm going to do is I'm going to blame Lenz and figure out what exactly he did wrong, because obviously it could have been me. No, but in all seriousness I think that one of the things we're going to ... And what we want to be doing is learning from this experience, going back and looking, whether it's at the models of the data and understanding how it could have been interpreted a little bit differently. We both feel really good about what's going to happen and even if the, I think, even if the GOP does pull out this victory, it's still a really impressive thing if the democrats are going to be able to make this close. It's not a small thing that going into this, that the GOP had 55% support in this district. If they can get close, that's an enormous change in the type of the voting behavior that's traditionally done. We feel very confident that it's going to be a slight democratic victory, but all signs are pointing to a good democratic showing in this election.

Lenz:  

I think the best way to think about this is as an experiment. Our hypothesis is Conor Lamb is going to narrowly win in the PA-18 today, but to be a scientist you also have to be willing to be wrong. It could be that our hypothesis is incorrect and if that's true, then we're going to revisit our algorithms and we're going to take what we've learned ... Like I said, we've highly instrumented this. We're collecting a lot of data today and we're going to change those algorithms and we're going to try again and we're going to keep doing it until we get it right.

Caroline:    

We should mention that we're recording this podcast on Friday, March 9th. That's a few days before the election. Our data, as you mentioned, is built on enthusiasm. It's not built on turnout. There is a chance that the republicans will somehow gain more momentum over the weekend that could affect the turnout on Tuesday.

Lenz:           

Absolutely. There's all sorts of things that can effect turnout separate from enthusiasm. There's social effects. If you know people who are voting, you're much more likely to turn out and vote, and we don't yet measure that. There are climate effects. If it snows, it's going to be a lot more difficult for people to get to the polls and it's going to limit what's going on. There's news effects, there's all sort so different things that exist outside of our system that we're not yet measuring. One of the things that really excites us is the fact that we're now going to collect data and we're going to collect all sorts of things that surround this election that we can use to basically weight our learnings and be even more confident next time.

Caroline:     

As a plug for this project that you're working on, we will have a full blog post coming out today around the work that they're doing looking at PA-18, but more specifically they're looking at this data and understanding how a voter becomes a supporter. What does that journey look like and how did they first discover the candidate? They are using a support of Conor Lamb as an example. Make sure you check out our blog, Dstillery.com, and you'll see the latest information from Peter and Pete. And you guys will join us again after the election, I hope,

Ibarra:        

Only if we're right. Otherwise, it's going to be Lenz.

Caroline:          

He'll take the fall.

Lenz:       

Yeah, I'm the fall guy today.

Ibarra:   

Exactly right.

Lenz:       

I'm the fall guy for this.

Caroline:                 

Awesome. Well, thanks guys for joining us, and we will talk to you guys soon.