Technology

Data Aren’t All The Same

It’s no secret that not all data is the same. When you’re buying and selling segments, you’re frequently left to wait until the end of the advertising campaign to measure the performance. If you’ve been disappointed it’s then too late to know whether the segments were right, correct, or whether you just had the wrong creative or call to action in the campaign.

Systematic look at segment quality

We’ve started systematically looking at segment data quality using some different mechanisms: looking for statistical correlations between data sets; comparing them against a random sample of the data, and even running primary research against new segments to establish what degree of truth the data sets might have. By directly validating the truth behind a segment we start to establish a degree of currency in the segment business.

Validating segments by correlation

Because many segments overlap, we don’t need to run primary research on all of them to be able to model a scorecard for a broad set. Like for example, a segment of people owning homes worth over $1m. We would expect this segment to have a high overlap with people earning more than $200k a year or those with a high credit rating. If we saw correlations with a low credit rating score or no association with other high-net-worth segments we would start to lower the score we give it. And this approach allows us to score most segments effectively, but we still struggle in areas like Pet Owners or Auto Intenders. Countless surveys have told us that women are more likely to own cats than men, but merely seeing a correlation between the Cat Owner segment and the Female segment is not clear enough as over 50% of the population is female, and far fewer than 50% own cats.

The problem with so-called anonymous data


The principle of data anonymization is to take anything personally identifiable, for example, your phone number or your email address, and replace it with an anonymous token. You might initially think this makes the data secure, but by combining the dataset with other datasets, it’s relatively straightforward to re-identify. If I’ve licensed a dataset that lists the websites that anonymous cookies have visited. Now when I receive this data, it doesn’t contain any personally identifiable information (PII), but if I have a login on my website, then I can match a visitor to my site directly to the feed. By matching the data and time of the actions of an anonymous user on the data feed and it matched, then I’ve now re-identified the consumer and have access to all of the other sites they’ve visited from the cookie feed.

The bottom line


At Dativa, our view is that all information relating to subscribers is potentially re-identifiable. All data is therefore possible PII and needs to the same level of security as PII. If your business relies on sharing more broadly, we recommend developing an API or a data-driven service rather than providing a raw data feed.

Tags

Related Articles

Close