Big Data

Big data visualisationBig Data is often touted as a solution to all our problems, a panacea for all ills often by people who struggle to define it. So what is big data and what kind of problems has it solved?

Big data refers to sets of data so big and complex that they cannot be analysed by traditional methods and tools, but which release new value when analysis is achieved.

Google translate is an example of a problem solved by the use of big data. Although the translations are imperfect they are often good enough to have an understanding of what the writer intended whatever language it was written in. Google does this by statistically analysing millions of documents online that exist in multiple languages and figuring out what is most likely to be a correct translation. The more documents available that have been accurately translated by humans the more accurate the Google translation will be.

Big data analysis has been used in predicting maintenance needs for UPS, New York city council and various car manufacturers. It’s been used in healthcare to predict the onset of infections in newborns, and outbreaks of flu.

So it sounds like it could solve some tough business problems, and it can. But it has limits.

  • messiness of data means tricky to anaylse and interpret – google translate occasionally gets the translation between Dutch and English completely wrong, and this is a language pair that must have millions of documents, you need good analytical expertise and data governance to get the valuable insights out of the data.
  • hidden biases in data collection, for example if you’re relying on smart phone data  you are probably selecting against the lowest income earners.
  • identifies correlation, but that explain causality and doesn’t necessarily tell you what to do.
  • privacy concerns; relating to the collection, use and reuse of data. People may not realise that if enough anonymised data is combined it is possible to identify an individual.

And sometimes all that extra data may induce a sort of paralysis by analysis, a belief that you could make the perfect decision with just a little more data.

Right now we’re only beginning to unlock the value of big sets of data, and it’s still very much in the hands of the experts. It’s going to take some re-learning for managers/business leaders to ask questions that big data can answer, and to understand that correlation does not imply causation.

image: Big Data: water wordscape / CC BY 2.0

Advertisements

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s