How to Figure Out Identity Resolution Once and for All
This article was originally published on AdAge.com on October 22, 2018.
When it comes to marketing, knowing your customers is absolutely essential. If you don't know them, how can you best serve them? But despite many platform vendors’ promises to help brands understand everything, like where customers live and shop, and when they go online, the world of customer data technology can be murky and rife with misinformation.
Accurate customer intelligence is essential for brands to know precisely who they are reaching with their media and what impact it will have on their business results — impossible to do if you can’t recognize those consumers across all devices. And as media proliferates across devices and channels, it’s become increasingly challenging for marketers. What's critical is not just understanding which multiple online devices belong to a consumer, but also knowing who that consumer is.
The solution to knowing your customer lies in identity resolution, the science of creating a coherent picture of people as they move between channels and devices, and interact via different “identifiers”. These can include email addresses, device IDs and home addresses. How your technology partners approach identity can have a significant impact on every marketing decision you make.
Identity resolution sounds complex (and it can be), but failure to understand how it works can lead to uninformed decision-making and costly mistakes. A quick primer: Identity encompasses offline attributes, like name, address, and a person’s multiple phone numbers and email addresses, plus signals generated as a customer interacts online (like a cookie or mobile ad ID).
Identity resolution allows you to connect the dots across these identifiers in an accurate, scalable and privacy-compliant way to produce a stable and persistent view of your customer. Knowing the right questions to ask about a vendor’s approach to identity resolution can help give you clarity and confidence that their methods can boost your ROI.
Deterministic vs. probabilistic
There are two main categories of data that people think about when it comes to identity: deterministic and probabilistic. These terms tend to imply value judgments. To many, “deterministic” suggests a gold standard and “probabilistic” implies guessing. Neither characterization is accurate. I prefer the terms “explicit” and “implicit”. Deterministic, or explicit, data is typically an email address that is associated with a cookie or a mobile ad ID. Implicit data makes inferences based on all of the signals about consumers' web behaviors and interactions. For example, you may have seen a consumer on a given device at a particular IP address at a specific point in time. When you have a good volume of that same signal data, it allows you to draw really strong conclusions.
The deterministic approach shows whether there’s an explicit link between identifiers, while the probabilistic method states how implicitly strong that linkage is.
When brands look to leverage identity linkages, they should have three goals in mind:
• Accuracy — maximizing accurate linkages while minimizing inaccurate ones.
• Richness — the complexity of understanding customers’ profiles, spanning digital and offline channels, devices, demographics, etc.
• Scale — the quantity of data and the ability to capture more linkages about more customers and create the largest possible audiences.
When it comes to achieving these goals — and to obtaining true identity resolution in general — both the deterministic and probabilistic methods have their benefits and drawbacks.
The deterministic method spots instances where two identifiers interact directly, and pairs them to build a picture of the customer. For example, if you sign in to a website using an email address, the brand can match that address with the cookie dropped for that logged-in session. If the email address is also connected to a mobile ad ID, the brand can link that to the email and cookie. And so on.
While the deterministic approach is truly omnichannel — it connects identifiers across the digital and offline worlds — it’s problematic because we have found that up to 50% of deterministic linkages are actually incorrect or even fraudulent. For example, when you share your Netflix password or your spouse uses your computer to check their email, incorrect linkages are created. And, as with the rest of the digital ecosystem, fraud is rife in the linkage economy (a good subject for another article). Further, deterministic data by its very nature is limited in size — there are a small and finite number of explicit linkages — so it's difficult to scale.
On the other hand, the probabilistic (or implicit) approach looks at multiple signals to discover which identifiers can be linked with high confidence. For instance, if the same mobile ad ID and desktop cookie were to visit a website from a residential IP address several nights weekly between 9-10 p.m., probabilistic methods would determine these signals all belong to the same household.
And if there are many additional observations of these same two devices, both connecting at the same time at multiple IP addresses, like at work or in an airport, it can be assumed the devices belong to a single person within that household. The probabilistic approach is good for weeding out false information because it looks at a wide variety of data versus just binary matches, like the deterministic method. But it’s limited to online data only — a major shortcoming in our clicks-and-bricks world.
In an attempt to devise a more accurate method, some vendors are now adding their explicit and implicit identity graphs together, calling it a hybrid approach. In theory, this would seem to be the best of both worlds, combining accuracy and scale. Unfortunately, that's not the case. While the intent is there, the execution may use poor or variable-quality raw data, inaccurate linkages and tainted data clusters. Adding incorrect and inaccurate data together can produce misleading — or just plain wrong — results.
For a hybrid method to be successful, marketers need to use raw data signals from both deterministic and probabilistic approaches to evaluate which linkages are relevant. In other words, deterministic data can’t simply be combined with probabilistic data after they’ve both been built via their separate methods. In that approach, you lose track of the raw signal, which muddies the data. It would be like adding yeast to an already-baked loaf of bread, and hoping it will rise more.
How is identity data used?
Identity data comes to life in activation and measurement. And in the best-case scenario, your identity graph will be stable and persistent enough to fuel both. That means you can both target an individual or household within the graph with precision. You can then resolve those touchpoints back to the individual or household and marry that with purchase or other outcome data. Doing this right requires a stable — and end-to-end — system of identity, the ability to make advertising decisions at the identity level, track each granular touchpoint, link them all to a persistent ID (even as many publishers and walled gardens use their own discrete tracking identifiers) and then resolve outcome data back to that ID. This is where most “people-based” solutions fail.
It’s hard to overstate the importance of strong identity resolution for any brand aiming to be more customer-centric. Marketers must become more discerning customers when it comes to identity, asking vendors tough questions about their methodologies. It’s essential to understand the data science behind the buzzwords — going beyond what the word “deterministic” suggests and the risk that “probabilistic” implies.
This will allow you to confirm that both your marketing platform and measurement approach can operate at the highest levels of granular, persistent, stable identity. Because no matter how great your insights or messaging or algorithms are, if you’re unable to serve advertising to the right consumer online, offline and across channels, or are unaware media exposures across multiple devices are connected to the same individual offline, your results will be wrong.