Big data refers to sets of data so big and complex that they cannot be analysed by traditional methods and tools, but which release new value when analysis is achieved.
Google translate is an example of a problem solved by the use of big data. Although the translations are imperfect they are often good enough to have an understanding of what the writer intended whatever language it was written in. Google does this by statistically analysing millions of documents online that exist in multiple languages and figuring out what is most likely to be a correct translation. The more documents available that have been accurately translated by humans the more accurate the Google translation will be.
Big data analysis has been used in predicting maintenance needs for UPS, New York city council and various car manufacturers. It’s been used in healthcare to predict the onset of infections in newborns, and outbreaks of flu.
So it sounds like it could solve some tough business problems, and it can. But it has limits.
- messiness of data means tricky to anaylse and interpret – google translate occasionally gets the translation between Dutch and English completely wrong, and this is a language pair that must have millions of documents, you need good analytical expertise and data governance to get the valuable insights out of the data.
- hidden biases in data collection, for example if you’re relying on smart phone data you are probably selecting against the lowest income earners.
- identifies correlation, but that explain causality and doesn’t necessarily tell you what to do.
- privacy concerns; relating to the collection, use and reuse of data. People may not realise that if enough anonymised data is combined it is possible to identify an individual.
And sometimes all that extra data may induce a sort of paralysis by analysis, a belief that you could make the perfect decision with just a little more data.
Right now we’re only beginning to unlock the value of big sets of data, and it’s still very much in the hands of the experts. It’s going to take some re-learning for managers/business leaders to ask questions that big data can answer, and to understand that correlation does not imply causation.