Some of the discussions we are having over here are brain-wrinklers! I was speaking with some colleagues yesterday about the security implications of big data. Typically I would group them into two separate ares:

  1. Using big data as an enabler for predictive security analytics (i.e., deriving security information powered by analytics across big data)
  2. Securing the output of big data analytics on the business side (and possibly in infosec too)

After talking about some of the uses of Greenplum Chorus, it occurred to me that there was a third area that needs to be addressed: the security problem of using independent but diverse big data sets to arrive at the same conclusion (especially when that conclusion could be part of a larger corporate strategy). Let’s paint a picture to see if we can make this problem more concrete.

One Month in Singapore, by Enygmatic-Halycon

If you have ever seen social engineering in action (go read this book if you haven’t), you know that social engineers don’t directly ask you for all the information they need. If they did, you would be suspicious. Imagine for a moment if someone randomly on the street asked for your bank account number. Pretty suspicious, right? So instead, they build rapport, earn your trust (or the trust of others that you interact with), and ask for seemingly innocuous pieces of data that only when assembled gives them the information they were looking for.

I see Big Data analytics as being similar in that regard whereby missing data sets could be filled in to arrive at similar outputs. Depending on the data set(s) it’s reasonable to assume that macro trends will be represented in multiple, independent sets of data. So let’s say that you are using predictive analytics to determine where you will be investing in the next six months. The output would be pretty valuable to a competitor or an insider looking to make a quick buck by selling this information. You realize that someone might want to steal this information, so you protect certain data sets that allow you to arrive at your conclusions. But what if someone could fill in those protected data sets with other data sets and arrive at the same conclusion? Just like a social engineer, he could fill in the gaps with reconstructed data and maybe get close enough to know exactly where your investments are going to respond accordingly. This is especially true with data sets that are free (presumably lower quality) and ones that can be purchased or even leased.

Companies dipping their toes into the world of Big Data analytics are getting to the point where they know just enough to be dangerous. They don’t have the Data Scientist DNA built in to their companies, so they hire experts to analyze data. But the experts just see data, they don’t necessarily know (or care) about the value of the data or the conclusions derived from it. As companies start to tackle the use of predictive analytics across diverse data sets, they must understand the process of deriving value from big data and how to protect all aspects of the analysis. What’s especially important is understanding the other possible ways that conclusions can be reached in order to know where your exposure might be.

This post originally appeared on BrandenWilliams.com.